This job post has expired on July 04, 2026. It is likely that the position has already been filled.

AI Evaluation Engineer (Python) at Turing

posted 1 month ago

turing.com Contractor remote TBD 330 views

AI Evaluation Engineer (Python) | Contractor | 40 hrs/week | Worldwide Remote

Turing is one of the world's fastest-growing AI companies, accelerating the advancement and deployment of powerful AI systems. We are seeking experienced Python developers to join our AI Evaluation team as AI Evaluation Specialists. In this role, you will design and author evaluation tasks that benchmark the capabilities of advanced AI systems on real-world software engineering challenges — directly influencing the next generation of frontier AI models.

What You'll Do

Design realistic, Python-focused software engineering evaluation tasks for AI agents
Write clear, precise instructions defining expected outputs, constraints, and success criteria
Create reference solutions that successfully solve evaluation tasks and satisfy validation requirements
Develop human-readable verifier descriptions documenting expected behaviors and evaluation checks
Author domain-specific knowledge files that guide AI systems on relevant workflows and best practices
Review tasks for clarity, consistency, edge cases, and overall evaluation quality
Analyze AI-generated outputs and identify common failure patterns
Collaborate with researchers and evaluation teams to improve benchmark quality and coverage

Requirements

Bachelor's degree or higher in Computer Science, Software Engineering, IT, or a related field
3–5 years of hands-on Python development experience
Strong proficiency in Python and core software engineering concepts
Experience with applications, automation scripts, APIs, data-processing workflows, or backend systems in Python
Solid understanding of data structures, algorithms, debugging, testing, and software development best practices
Excellent written English with the ability to produce clear, unambiguous technical documentation
Comfortable working with structured formats: JSON, Markdown, YAML, DOCX, and XLSX

Nice to Have

Experience with LLM evaluation, prompt engineering, or AI benchmarking
Experience creating technical assessments, coding challenges, or educational content
Familiarity with Docker, containers, or cloud-based development environments

Engagement Details

Commitment: 40 hours/week with 4-hour overlap with PST
Type: Contractor/Freelancer (no medical or paid leave benefits)
Duration: 2-month contract, starting next week

Perks

Work on cutting-edge AI projects with leading research organizations
Flexible, fully remote work environment
Opportunity to shape the evaluation of next-generation AI systems
Collaborate with a global network of highly skilled professionals

Apply on Turing Go back

Show all jobs of Turing

AI Evaluation Engineer (Python) at Turing

What You'll Do

Requirements

Nice to Have

Engagement Details

Perks

Related Jobs

Turing

TBD remote in US

Turing

TBD remote

Turing

Varies remote

Turing

Varies remote

Turing

Varies remote in US

Turing

Varies remote

Turing

Varies remote

Turing

Varies remote

Turing

Varies remote

Turing

Varies remote

Turing

Varies remote

Turing

Varies remote