Benture logo
next job
Turing logo

Software Engineer – AI Code Evaluation at Turing

posted 1 hour ago
turing.com Contractor remote TBD 32 views

Software Engineer – AI Code Evaluation & Benchmarking | Contractor | Worldwide Remote

Join Turing, one of the world's fastest-growing AI companies, and help shape the future of large language models (LLMs). In this role, you'll evaluate and benchmark AI-generated code, validate solutions against real-world software engineering tasks, and contribute to high-quality evaluation datasets that directly improve frontier AI coding systems.

What You'll Do

  • Review AI-generated code for correctness, efficiency, maintainability, and adherence to requirements.
  • Analyze software engineering tasks and validate whether proposed solutions meet expected outcomes.
  • Debug code, reproduce issues, and verify fixes across different programming environments.
  • Assess model-generated explanations, reasoning, and implementation approaches for technical accuracy.
  • Create, refine, and maintain evaluation datasets, benchmarks, and grading rubrics for coding tasks.
  • Identify edge cases and failure modes where AI systems struggle with software engineering problems.
  • Document findings clearly and provide structured feedback to improve evaluation quality.
  • Collaborate with project teams to establish quality standards and evaluation methodologies.

Requirements

  • Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
  • 3+ years of professional software engineering experience.
  • Strong proficiency in one or more of: Python, Java, C/C++, Go, Swift, Objective-C, PHP, or SQL.
  • Solid understanding of data structures, algorithms, software design principles, and debugging methodologies.
  • Experience performing code reviews and evaluating code quality in production or large-scale codebases.
  • Familiarity with version control systems (e.g., Git) and modern software development workflows.
  • Strong written communication skills and attention to detail.
  • Bonus: Experience with AI/ML data annotation, NLP, prompt engineering, or LLM-related projects.
  • Highly preferred: Experience evaluating AI-generated code, benchmark creation, or software quality assessment.

Engagement Details

  • Type: Contractor assignment (no medical/paid leave)
  • Commitment: Minimum 20 hours/week (at least 4 hrs/day), with 4-hour overlap with PST
  • Duration: 1 month (expected start: next week)

Why Work With Turing?

  • Fully remote, flexible work environment.
  • Opportunity to contribute to cutting-edge AI projects with leading LLM companies.

Go back

Related Jobs

Benture logo
See All Jobs
Back