Benture logo
 ←  next job →
Turing logo

Senior Software Engineer – LLM Eval at Turing

posted 1 hour ago
turing.com Contractor remote in US Varies 31 views

Senior Software Engineer – LLM Evaluation | Contractor | Remote (US Only)

Join Turing, the world's leading AI research accelerator, and help shape the future of large language models. In this role, you'll create high-quality datasets, evaluate AI-generated code, and collaborate with frontier AI researchers — all on a flexible contractor basis with a minimum of 10 hours per week.

About Turing

Based in San Francisco, Turing partners with frontier AI labs and global enterprises to accelerate AI research and deploy reliable, high-impact AI systems. Our expertise spans software engineering, logical reasoning, STEM, multilinguality, multimodality, and autonomous agents.

Role Overview

As a Software Engineering Evaluator, you will curate and refine code datasets used to train and benchmark large language models. Your work will directly influence the quality and capability of next-generation AI systems, with a strong focus on systems-level programming, performance-critical applications, and infrastructure.

What You'll Do

  • Curate code examples, build solutions, and correct code in Python, C/C++, Rust, Go, Java, and JavaScript (including ReactJS).
  • Evaluate and refine AI-generated code for correctness, performance, scalability, and reliability.
  • Build agents to verify the quality of systems-level and infrastructure code and identify error patterns.
  • Design automated verification mechanisms for software engineering tasks.
  • Collaborate with cross-functional teams to benchmark AI-driven coding solutions against industry standards.
  • Analyze and evaluate model capabilities across the full software engineering lifecycle — from prototyping and architecture design to production, monitoring, and maintenance.

Required Skills

  • 3+ years of professional software engineering experience.
  • Strong expertise in systems programming, infrastructure, or backend development using Python, C/C++, Rust, or Go.
  • Proven experience building and deploying scalable, production-grade software.
  • Deep understanding of software architecture, design patterns, debugging, and code quality assessment.
  • Excellent written and verbal communication skills for producing clear, structured evaluation rationales.

Ideal Background

This role is a strong fit for engineers with experience at frontier AI or technology organizations such as OpenAI, NVIDIA, Databricks, Palantir, or Snowflake. Graduates from top-tier programs are welcome, though exceptional skill and experience always take precedence.

Engagement Details

  • Type: Contractor (no medical or paid leave benefits)
  • Commitment: Flexible — minimum 10 hrs/week, up to 40 hrs/week
  • Duration: 1 month, with potential extensions based on performance
  • Location: Must be based in the United States

Application Process

The application takes approximately 15–30 minutes and includes an AI video interview. Apply today to contribute to cutting-edge AI research at the frontier of the field.

Go back

Related Jobs

Benture logo
See All Jobs