This job post has expired on May 09, 2026. It is likely that the position has already been filled.

Senior Python Engineer – LLM Eval at Turing

posted 3 months ago

turing.com Contractor remote in US Varies 346 views

Senior Python Engineer – LLM Evaluation | Contractor | Remote (US Only)

Join Turing, the world's leading AI research accelerator based in San Francisco, as a Senior Python Engineer focused on LLM Evaluation. In this role, you'll help shape the future of large language models by building high-quality datasets, evaluating AI-generated code, and collaborating with top researchers on cutting-edge AI systems. This is a flexible contractor engagement ideal for experienced engineers who thrive in fast-paced, high-impact environments.

About Turing

Turing partners with frontier AI labs and global enterprises to accelerate AI research and deploy advanced AI systems at scale. With expertise spanning software engineering, logical reasoning, STEM, multilinguality, and multimodality, Turing helps organizations transform AI from proof of concept into measurable business impact.

What You'll Do

Curate code examples, build solutions, and correct code across Python, JavaScript (React, Node.js), and additional languages including C/C++, Java, Rust, and Go.
Evaluate and refine AI-generated code for efficiency, scalability, and reliability across backend and frontend contexts.
Build agents to verify code quality and identify error patterns in full-stack applications.
Design automated verification mechanisms for software engineering tasks.
Hypothesize on software engineering lifecycle stages — from prototyping and architecture to production, monitoring, and maintenance — and evaluate model capabilities accordingly.
Collaborate with cross-functional teams to benchmark and enhance AI-driven coding solutions.

Required Skills

3+ years of professional software engineering experience.
Strong expertise in full-stack development using Python and JavaScript (React, Node.js).
Proven experience deploying scalable, production-grade software.
Deep understanding of software architecture, design patterns, debugging, and code quality assessment.
Excellent written and verbal communication skills for structured evaluation rationales.

Ideal Background

We welcome engineers who have shipped high-impact products at companies like Stripe, Airbnb, Cloudflare, Datadog, or Coinbase, as well as graduates from strong CS programs such as UW, UIUC, UT Austin, or University of Michigan. Exceptional skill and experience always take precedence over pedigree.

Engagement Details

Type: Contractor (no medical/paid leave)
Commitment: Flexible — minimum 10 hrs/week, up to 40 hrs/week
Duration: 1 month, with potential extensions based on performance
Location: Must be based in the United States

Evaluation Process

Application takes approximately 15–30 minutes.
Completion of an AI video interview is required.

Apply on Turing Go back

Show all jobs of Turing

Senior Python Engineer – LLM Eval at Turing

Related Jobs

Turing

TBD remote in US

Turing

TBD remote in US

Turing

Varies Remote

Turing

Varies remote

Turing

Varies Remote (ex-US)

Turing

TBD Remote (ex-US)

Turing

Varies Remote (ex-US)

Turing

Varies Remote (ex-US)

Turing

Varies Remote (Non-US)

Turing

Varies Remote (non-US)

Turing

TBD remote in US

Turing

TBD remote