This job post has expired on May 22, 2026. It is likely that the position has already been filled.

SwarmBench Task Engineer — Data at Turing

posted 2 months ago

turing.com Contractor remote TBD 525 views

SwarmBench Task Engineer — Data Analysis | Contractor | Remote | 4-Week Engagement

Turing is seeking experienced SwarmBench Task Engineers specializing in Data Analysis to design and develop high-quality multi-agent benchmark tasks that evaluate the analytical reasoning, coordination, and execution capabilities of advanced AI systems. This is a short-term, high-impact contractor role working at the frontier of LLM evaluation.

About Turing

Turing is one of the world's fastest-growing AI companies, accelerating the advancement and deployment of powerful AI systems. We partner with leading AI labs to advance frontier model capabilities in reasoning, coding, agentic behavior, and more — and we build real-world AI systems that solve mission-critical challenges for enterprises.

Role Overview

In this role, you will build realistic benchmark tasks requiring AI agents to analyze large, complex, multi-source datasets, decompose work across specialist sub-agents, and arrive at specific, verifiable conclusions. Tasks may involve structured and semi-structured data such as CSVs, JSON files, logs, reports, survey results, vendor assessments, and financial or operational documents.

Day-to-Day Responsibilities

Design and author multi-agent benchmark tasks centered on complex data analysis workflows
Create realistic synthetic datasets or curate real-world style datasets across domains such as finance, operations, security, or market analysis
Build tasks requiring agents to perform cross-referencing, anomaly detection, contradiction identification, and statistical computation across multiple sources
Develop decomposition guides that split analytical work across specialist sub-agents (e.g., financial, technical, security, or operations analysts)
Write precise oracle logic or verification scripts that validate specific analytical conclusions
Create reproducible evaluation environments using Python and Docker
Review task performance signals to ensure strong separation between weaker and stronger agentic systems
Refine tasks to improve determinism, clarity, difficulty, and scoring quality

Requirements

5+ years of experience in data analysis
Strong proficiency in SQL and Python for data analysis and scripting (pandas, NumPy, or similar)
Experience working with real-world, messy datasets (CSV, JSON, logs, reports)
Ability to design non-trivial analytical questions with clear, specific, and verifiable answers
Solid understanding of statistical concepts (averages, distributions, outliers, correlations)
Familiarity with AI coding benchmark environments (e.g., SWE-bench, Terminal-Bench)
Comfortable working with Docker (writing Dockerfiles, building images, debugging containers)

Contract Details

Duration: 4 weeks (expected start: next week)
Commitment: 8 hours/day with a 4-hour overlap with PST
Type: Contractor position (does not include medical/paid leave benefits)

Why Work With Turing?

Contribute to cutting-edge AI projects with leading foundation model companies
Work on high-impact tasks at the frontier of LLM evaluation and reasoning
Fully remote with flexible collaboration across global teams

Apply on Turing Go back

Show all jobs of Turing

SwarmBench Task Engineer — Data at Turing

About Turing

Role Overview

Day-to-Day Responsibilities

Requirements

Contract Details

Why Work With Turing?

Related Jobs

Turing

TBD remote in US

Turing

TBD remote

Turing

Varies remote

Turing

Varies remote

Turing

Varies remote in US

Turing

Varies remote

Turing

Varies remote

Turing

Varies remote

Turing

Varies remote

Turing

Varies remote

Turing

Varies remote

Turing

Varies remote