Benture logo
 ←  next job →
Turing logo

Senior Software Engineer – LLM Eval at Turing

posted 2 hours ago
turing.com Contractor remote in US Varies 30 views

Senior Software Engineer – LLM Evaluation | Contractor | Remote (US Only) | Flexible Hours (10–40 hrs/week)

Turing is seeking a seasoned Senior Software Engineer to help shape the future of large language models by building and evaluating high-quality AI training datasets. This is a flexible contractor engagement ideal for engineers who thrive in fast-paced, high-impact environments.

About Turing

Headquartered in San Francisco, Turing is the world's leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing accelerates frontier research through high-quality data, advanced training pipelines, and top-tier AI researchers — and applies that expertise to help enterprises transform AI from proof of concept into measurable business impact.

Role Overview

As a Software Engineering Evaluator, you will create cutting-edge datasets used for training, benchmarking, and advancing large language models. You'll work across the full stack — Python for backend and ML workflows, JavaScript (React, Node.js) for frontend and API layers — as well as C/C++, Java, Rust, and Go. You'll evaluate and refine AI-generated code for efficiency, scalability, and reliability, collaborating closely with researchers and cross-functional teams.

What You'll Do

  • Curate code examples, build solutions, and correct code across Python, JavaScript (React, Node.js), C/C++, Java, Rust, and Go for AI model training initiatives.
  • Evaluate and refine AI-generated code across backend and frontend contexts for efficiency, scalability, and reliability.
  • Collaborate with cross-functional teams to benchmark and enhance AI-driven coding solutions.
  • Build agents that verify code quality and identify error patterns across full-stack applications.
  • Hypothesize on software engineering lifecycle stages — from prototyping and architecture design to production, launch, and monitoring — and evaluate model capabilities accordingly.
  • Design automated verification mechanisms to validate solutions to software engineering tasks.

Required Skills

  • 3+ years of professional software engineering experience.
  • Strong expertise in full-stack development using Python and JavaScript (React, Node.js).
  • Proven experience deploying scalable, production-grade software with modern languages and tools.
  • Deep understanding of software architecture, design, debugging, and code quality assessment.
  • Excellent written and verbal communication skills for structured, clear evaluation rationales.

Engagement Details

  • Type: Contractor (no medical/paid leave benefits)
  • Commitment: Flexible — minimum 10 hrs/week, up to 40 hrs/week
  • Duration: 1 month, with potential extensions based on performance
  • Location: Must be based in the United States

Application Process

The application takes approximately 15–30 minutes and includes an AI video interview. We look forward to learning more about you!

Go back

Related Jobs

Benture logo
See All Jobs