Senior Software Engineer – LLM Evaluation | Contractor | Remote (US Only)
Join Turing, the world's leading AI research accelerator, as a Senior Software Engineer specializing in LLM Evaluation. In this role, you'll create high-quality datasets used to train, benchmark, and advance frontier large language models — working alongside top AI researchers and cross-functional teams.
About Turing
Based in San Francisco, Turing partners with frontier AI labs and global enterprises to accelerate AI research and deploy reliable, high-impact AI systems. Our team specializes in software engineering, logical reasoning, STEM, multilinguality, multimodality, and AI agents.
What You'll Do
- Curate code examples, build solutions, and correct code across Python, JavaScript (React, Node.js), C/C++, Java, Rust, and Go for AI model training initiatives.
- Evaluate and refine AI-generated code across backend and frontend contexts for efficiency, scalability, and reliability.
- Build agents to verify code quality and identify error patterns across full-stack applications.
- Design automated verification mechanisms for software engineering tasks.
- Hypothesize on software engineering lifecycle stages — from prototyping and architecture to production, monitoring, and maintenance — and evaluate model capabilities across them.
- Collaborate with cross-functional teams to benchmark and enhance AI-driven coding solutions.
Required Skills
- 3+ years of professional software engineering experience.
- Strong full-stack expertise in Python and JavaScript (React, Node.js).
- Experience deploying scalable, production-grade software with modern tools and languages.
- Deep understanding of software architecture, design, debugging, and code quality assessment.
- Excellent written and verbal communication skills for structured evaluation rationales.
Ideal Background
Experience at frontier AI organizations (e.g., OpenAI, NVIDIA, Databricks, Palantir, Snowflake) or graduates from top CS programs (Stanford, MIT, CMU, UC Berkeley, Georgia Tech) are strongly encouraged to apply. Exceptional skill and experience always take precedence over pedigree.
Engagement Details
- Type: Contractor (no medical/paid leave benefits)
- Commitment: Flexible — minimum 10 hrs/week, up to 40 hrs/week
- Duration: 1 month, with potential extensions based on performance
- Location: Must be based in the United States
Evaluation Process
The application takes approximately 15–30 minutes and includes an AI video interview.