Benture logo
 ←  next job →
Turing logo

Senior Software Engineer – LLM Eval at Turing

posted 1 hour ago
turing.com Contractor remote in US Varies 29 views

Senior Software Engineer – LLM Evaluation | Contractor | Remote (US Only)

Join Turing, the world's leading AI research accelerator, and play a key role in shaping the next generation of large language models. This contractor role focuses on evaluating and improving AI-generated code across a wide range of languages and systems-level domains — ideal for experienced engineers who thrive in fast-paced, high-impact environments.

About Turing

Headquartered in San Francisco, Turing partners with frontier AI labs and global enterprises to accelerate AI research and deploy reliable, production-grade AI systems. Our team specializes in software engineering, logical reasoning, STEM, multilinguality, multimodality, and AI agents.

Role Overview

As a Software Engineering Evaluator, you will create high-quality datasets used to train, benchmark, and advance large language models. You'll curate code examples, develop precise solutions, and evaluate AI-generated code for correctness, performance, and scalability — with a strong emphasis on systems-level and infrastructure code.

Key Responsibilities

  • Curate code examples and build or correct solutions in Python, C/C++, Rust, Go, Java, and JavaScript (including ReactJS).
  • Evaluate and refine AI-generated code for systems-level correctness, efficiency, and reliability.
  • Collaborate with cross-functional teams to benchmark AI-driven coding solutions against industry standards.
  • Build agents to verify quality and identify error patterns in systems and infrastructure code.
  • Analyze software engineering lifecycle stages — from prototyping and architecture to deployment and monitoring — and assess model capabilities across them.
  • Design automated verification mechanisms for software engineering tasks.

Required Skills

  • 3+ years of professional software engineering experience.
  • Strong expertise in systems programming, infrastructure, or backend development (Python, C/C++, Rust, Go).
  • Proven experience building and deploying scalable, production-grade software.
  • Deep understanding of software architecture, design patterns, debugging, and code quality review.
  • Excellent written and verbal communication skills for structured evaluation rationales.

Engagement Details

  • Type: Contractor (no medical/paid leave benefits)
  • Hours: Flexible — minimum 10 hrs/week, up to 40 hrs/week
  • Duration: 1 month, with potential extensions based on performance
  • Location: Must be based in the United States

Application Process

The application takes approximately 15–30 minutes and includes an AI video interview. We welcome graduates from top CS programs (Stanford, MIT, CMU, UC Berkeley, Georgia Tech, etc.), though exceptional experience always takes precedence over pedigree.

Go back

Related Jobs

Benture logo
See All Jobs