
SwarmBench Task Engineer (SWE/Code) | Contractor | Remote | 4-Week Engagement
Turing is seeking an experienced SwarmBench Task Engineer to design and build high-quality multi-agent benchmark tasks rooted in real-world software engineering workflows. This is a short-term contractor role ideal for senior engineers passionate about AI evaluation and frontier model development.
Turing is one of the world's fastest-growing AI companies, partnering with leading AI labs to advance frontier model capabilities in reasoning, coding, agentic behavior, and more. We build real-world AI systems that solve mission-critical challenges for global enterprises.
In this role, you will create benchmark tasks grounded in real open-source code changes — including bug fixes, migrations, and refactors — used to evaluate how effectively AI agents navigate large codebases, apply precise modifications, and produce correct, testable outputs. You'll work within the Harbor evaluation framework and collaborate with global teams at the frontier of LLM evaluation.