Benture logo
 ←  next job →
Turing logo

AI Evaluation Engineer at Turing

posted 57 minutes ago
turing.com Contractor remote TBD 26 views

AI Evaluation Engineer (Python / Java / Web) | Contractor | 40 hrs/week | Worldwide Remote

Turing is seeking experienced Software Engineers to join its AI Evaluation team as AI Benchmark Authors. In this role, you will design and author high-quality evaluation tasks that measure the capabilities of advanced AI agents in real-world software development scenarios — directly influencing the benchmarking of frontier AI models used by leading AI research organizations.

What You'll Do

  • Design realistic software engineering evaluation tasks for AI agents
  • Write clear, unambiguous instructions defining expected outputs, constraints, and success criteria
  • Create reference solutions that successfully solve authored tasks
  • Develop verification criteria and automated test descriptions for task validation
  • Author domain-specific skill files covering workflows, conventions, and best practices
  • Ensure consistency across benchmark variants while maintaining rigorous evaluation standards
  • Review tasks for quality, edge cases, and failure modes to improve benchmark reliability
  • Collaborate with AI researchers, evaluators, and engineering teams to refine benchmark quality

Requirements

  • Bachelor's degree or higher in Computer Science, Software Engineering, or a related field
  • 5+ years of hands-on software development experience
  • Strong expertise in at least one domain: Python, Java/JVM, or Web Application Development (Frontend, Backend, or Full Stack)
  • Excellent written English with the ability to craft precise technical instructions
  • Solid understanding of software engineering workflows, debugging, testing, and code quality practices
  • Experience with structured file formats such as JSON, Markdown, YAML, DOCX, or XLSX

Nice to Have

  • Experience with LLM evaluation, prompt engineering, or AI benchmarking
  • Background in creating technical assessments, coding challenges, or educational content
  • Familiarity with Docker, containers, or cloud-based development environments

Engagement Details

  • Commitment: 40 hours/week with 4-hour overlap with PST
  • Type: Contractor/Freelancer (no medical or paid leave benefits)
  • Duration: 2-month contract, starting next week

Why Work With Turing?

  • Contribute to cutting-edge AI projects with top AI research organizations
  • Flexible, fully remote work environment
  • Influence the evaluation of next-generation AI systems
  • Collaborate with a global network of highly skilled professionals

Go back

Related Jobs

Benture logo
See All Jobs