This job post has expired on May 22, 2026. It is likely that the position has already been filled.

SwarmBench Task Engineer — SWE at Turing

posted 1 month ago

turing.com Contractor remote TBD 401 views

SwarmBench Task Engineer (SWE/Code) | Contractor | Remote | 4-Week Engagement

Turing is seeking an experienced SwarmBench Task Engineer to design and build high-quality multi-agent benchmark tasks rooted in real-world software engineering workflows. This is a short-term contractor role ideal for senior engineers passionate about AI evaluation and frontier model development.

About Turing

Turing is one of the world's fastest-growing AI companies, partnering with leading AI labs to advance frontier model capabilities in reasoning, coding, agentic behavior, and more. We build real-world AI systems that solve mission-critical challenges for global enterprises.

Role Overview

In this role, you will create benchmark tasks grounded in real open-source code changes — including bug fixes, migrations, and refactors — used to evaluate how effectively AI agents navigate large codebases, apply precise modifications, and produce correct, testable outputs. You'll work within the Harbor evaluation framework and collaborate with global teams at the frontier of LLM evaluation.

Day-to-Day Responsibilities

Build multi-agent benchmark tasks based on real-world open-source code changes (bug fixes, migrations, refactors)
Work within the Harbor evaluation framework to run and validate tasks inside Docker environments
Write clear, precise task instructions specifying file paths, function signatures, expected behavior, and constraints
Design and implement Python-based verification scripts to validate correctness of agent-generated code changes
Create decomposition strategies that split complex code changes across multiple independent sub-agents
Debug and refine tasks within containerized environments to ensure reproducibility and determinism
Evaluate task performance signals and continuously improve task quality, clarity, and difficulty

Requirements

5+ years of experience in Python and JavaScript development
Experience with AI coding benchmarks (e.g., SWE-bench, Terminal-Bench)
Strong ability to read and navigate large open-source codebases (e.g., Django, Flask, FastAPI, Node.js)
Solid familiarity with Git workflows — pull requests, diffs, cherry-picking, and commit-level navigation
Comfortable writing Dockerfiles, building images, and debugging container issues
Experience writing test scripts using pytest, unittest, or custom assertion-based frameworks
Ability to produce clear, precise, and unambiguous technical specifications

Engagement Details

Commitment: 8 hours/day with a 4-hour overlap with PST
Type: Contractor (no medical/paid leave benefits)
Duration: 4 weeks, starting next week

Go back

Show all jobs of Turing

SwarmBench Task Engineer — SWE at Turing

About Turing

Role Overview

Day-to-Day Responsibilities

Requirements

Engagement Details

Related Jobs

Turing

Varies Remote (Select)

Turing

Varies Remote (select)

Turing

Varies Remote (select)

Turing

Varies Remote (select)

Turing

Varies Remote (select)

Turing

TBD Remote (Select)

Turing

TBD remote

Turing

Varies remote

Turing

Varies remote

Turing

Varies Remote: IN/PK/NG/KE

Turing

Varies Remote (Select)

Turing

Varies remote