Senior Backend Engineer (Python/FastAPI) | AI Evaluation | US-Based Remote Contractor
Join Turing — the world's leading AI research accelerator — as a Senior Backend Engineer focused on AI model evaluation. In this contractor role, you'll help shape the future of large language models by curating high-quality datasets, evaluating AI-generated code, and building verification systems for production-grade software. Flexible hours (10–40 hrs/week), fully remote within the US.
About Turing
Based in San Francisco, Turing partners with frontier AI labs and global enterprises to accelerate AI research and deploy reliable, high-impact AI systems. Our team specializes in software engineering, logical reasoning, STEM, multilinguality, multimodality, and AI agents.
Role Overview
As a Software Engineering Evaluator, you will collaborate with researchers to create cutting-edge training and benchmarking datasets for large language models. Your work will directly influence the quality and reliability of AI-generated code across multiple languages and domains.
Key Responsibilities
- Curate code examples, build solutions, and correct code in Python, C/C++, Rust, Go, Java, and JavaScript (including ReactJS).
- Evaluate and refine AI-generated code for systems-level correctness, performance, and reliability.
- Collaborate with cross-functional teams to benchmark and improve AI-driven coding solutions.
- Build agents to verify the quality of systems-level and infrastructure code and identify error patterns.
- Analyze software engineering lifecycle stages — from prototyping and architecture to production, monitoring, and maintenance — and evaluate model capabilities across them.
- Design automated verification mechanisms for software engineering tasks.
Required Skills
- 3+ years of professional software engineering experience.
- Strong expertise in systems programming, infrastructure, or backend development using Python, C/C++, Rust, or Go.
- Proven experience building and deploying scalable, production-grade software.
- Deep understanding of software architecture, design patterns, debugging, and code quality assessment.
- Excellent written and verbal communication skills for structured evaluation rationales.
Engagement Details
- Type: Contractor (no medical/paid leave benefits)
- Commitment: Flexible — minimum 10 hrs/week, up to 40 hrs/week
- Duration: 1 month, with potential extensions based on performance
- Location: Must be based in the United States
Application Process
The application takes approximately 15–30 minutes and includes an AI video interview. We look forward to reviewing your background!