Software Engineer – AI Code Evaluation at Turing

posted 20 days ago

turing.com Contractor remote TBD 156 views

Software Engineer – AI Code Evaluation & Benchmarking | Contractor | Worldwide Remote

Join Turing, one of the world's fastest-growing AI companies, and help shape the future of large language models (LLMs). In this role, you'll evaluate and benchmark AI-generated code, validate solutions against real-world software engineering tasks, and contribute to high-quality evaluation datasets that directly improve frontier AI coding systems.

What You'll Do

Review AI-generated code for correctness, efficiency, maintainability, and adherence to requirements.
Analyze software engineering tasks and validate whether proposed solutions meet expected outcomes.
Debug code, reproduce issues, and verify fixes across different programming environments.
Assess model-generated explanations, reasoning, and implementation approaches for technical accuracy.
Create, refine, and maintain evaluation datasets, benchmarks, and grading rubrics for coding tasks.
Identify edge cases and failure modes where AI systems struggle with software engineering problems.
Document findings clearly and provide structured feedback to improve evaluation quality.
Collaborate with project teams to establish quality standards and evaluation methodologies.

Requirements

Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
3+ years of professional software engineering experience.
Strong proficiency in one or more of: Python, Java, C/C++, Go, Swift, Objective-C, PHP, or SQL.
Solid understanding of data structures, algorithms, software design principles, and debugging methodologies.
Experience performing code reviews and evaluating code quality in production or large-scale codebases.
Familiarity with version control systems (e.g., Git) and modern software development workflows.
Strong written communication skills and attention to detail.
Bonus: Experience with AI/ML data annotation, NLP, prompt engineering, or LLM-related projects.
Highly preferred: Experience evaluating AI-generated code, benchmark creation, or software quality assessment.

Engagement Details

Type: Contractor assignment (no medical/paid leave)
Commitment: Minimum 20 hours/week (at least 4 hrs/day), with 4-hour overlap with PST
Duration: 1 month (expected start: next week)

Why Work With Turing?

Fully remote, flexible work environment.
Opportunity to contribute to cutting-edge AI projects with leading LLM companies.

Go back

Show all jobs of Turing

Software Engineer – AI Code Evaluation at Turing

What You'll Do

Requirements

Engagement Details

Why Work With Turing?

Related Jobs

Turing

Varies remote

Turing

TBD remote

Turing

TBD remote in UK

Turing

Varies remote

Turing

Varies remote

Turing

TBD remote

Turing

TBD Remote (select)

Turing

Varies remote

Turing

TBD remote

Turing

Varies remote

Turing

Varies remote

Turing

Varies remote