Benture logo
 ←  next job →
Turing logo

Agentic Coding Annotator at Turing

posted 2 hours ago
turing.com Contractor remote Varies 37 views

Agentic Coding Annotator (Offline Tasks) | Contractor | Remote | Turing

Turing is seeking experienced software practitioners to evaluate and improve datasets for agentic coding models. This is a high-precision, technically demanding role — not a basic annotation job. You'll work within realistic coding environments, review model trajectories, verify solutions, and produce high-quality annotations that directly influence frontier AI development.

About Turing

Turing is one of the world's fastest-growing AI companies, partnering with leading AI labs to advance frontier model capabilities in coding, reasoning, agentic behavior, and more. We build real-world AI systems that solve mission-critical challenges for companies worldwide.

Role Overview

This role focuses on offline evaluation tasks, which include:

  • Designing realistic, multi-step coding tasks
  • Calibrating tasks through user simulation
  • Writing task-specific rubrics and binary evaluation criteria
  • Grading and ranking model-generated trajectories

Day-to-Day Responsibilities

  • Execute realistic coding tasks within an agentic coding harness while maintaining model blindness and session independence
  • Verify model outputs by reading code, running commands, checking logs, and inspecting generated artifacts
  • Perform targeted validation using tests, scripts, and manual checks
  • Write clear, evidence-based rationales for trajectory rankings and assessments
  • Design multi-step coding tasks including user intent and milestone structure
  • Create and refine task-specific rubrics and evaluation criteria
  • Review completed work for quality, completeness, consistency, and schema compliance
  • Identify and escalate broken environments or process gaps with supporting evidence

Requirements

Software Engineering Fluency (Mandatory)
  • 5+ years of experience in software engineering, QA, developer tooling, data/ML engineering, or similar code-heavy roles
  • Strong hands-on experience in at least 1–2 programming languages such as Python, JavaScript/TypeScript, Rust, Java, C/C++, Bash, Haskell, Swift, or SQL
  • Ability to read unfamiliar codebases, debug issues, run tests, and evaluate functional correctness
Terminal & Tooling Skills (Mandatory)
  • Comfortable working in Linux/Ubuntu-like environments
  • Proficient with terminal workflows, Git, code editors, package managers, test runners, JSON, YAML, and Markdown
  • Familiarity with Docker and reproducible environments is a strong plus
Coding-Agent Workflow Familiarity (Mandatory)
  • Experience working with agentic coding tools such as OpenCode, Claude Code, Cursor, or similar platforms
Quality Judgment & Annotation Accuracy (Mandatory)
  • Ability to compare model trajectories and identify meaningful differences
  • Distinguish correctness from style, communication quality, and agent behavior
  • Evaluate solutions consistently using defined rubrics
  • Write concise, evidence-based rationales — not generic summaries

Preferred Qualifications (Offline / Senior Candidates)

  • Strong Docker skills and experience building/debugging reproducible environments
  • Experience in large, complex repositories beyond greenfield or tutorial-level projects
  • Demonstrated originality and sound engineering judgment in defining technical problems
  • Ability to design realistic, non-trivial tasks that go beyond simple bug fixes or README flows

Contract Details

  • Commitment: 8 hours/day with a 4-hour overlap with PST
  • Employment Type: Contractor (no medical/paid leave included)
  • Duration: 4 weeks, starting next week

Why Work With Turing?

  • Contribute to cutting-edge AI projects with leading foundation model companies
  • Work at the frontier of LLM evaluation and reasoning
  • Fully remote and flexible with global teams
  • Competitive compensation based on experience and project scope

Go back

Related Jobs

Benture logo
See All Jobs