Benture logo
next job →
Turing logo

SwarmBench Task Engineer — Data at Turing

posted 1 hour ago
turing.com Contractor remote TBD 32 views

SwarmBench Task Engineer — Data Analysis | Contractor | Remote | 4-Week Engagement

Turing is seeking experienced SwarmBench Task Engineers specializing in Data Analysis to design and develop high-quality multi-agent benchmark tasks that evaluate the analytical reasoning, coordination, and execution capabilities of advanced AI systems. This is a short-term, high-impact contractor role working at the frontier of LLM evaluation.

About Turing

Turing is one of the world's fastest-growing AI companies, accelerating the advancement and deployment of powerful AI systems. We partner with leading AI labs to advance frontier model capabilities in reasoning, coding, agentic behavior, and more — and we build real-world AI systems that solve mission-critical challenges for enterprises.

Role Overview

In this role, you will build realistic benchmark tasks requiring AI agents to analyze large, complex, multi-source datasets, decompose work across specialist sub-agents, and arrive at specific, verifiable conclusions. Tasks may involve structured and semi-structured data such as CSVs, JSON files, logs, reports, survey results, vendor assessments, and financial or operational documents.

Day-to-Day Responsibilities

  • Design and author multi-agent benchmark tasks centered on complex data analysis workflows
  • Create realistic synthetic datasets or curate real-world style datasets across domains such as finance, operations, security, or market analysis
  • Build tasks requiring agents to perform cross-referencing, anomaly detection, contradiction identification, and statistical computation across multiple sources
  • Develop decomposition guides that split analytical work across specialist sub-agents (e.g., financial, technical, security, or operations analysts)
  • Write precise oracle logic or verification scripts that validate specific analytical conclusions
  • Create reproducible evaluation environments using Python and Docker
  • Review task performance signals to ensure strong separation between weaker and stronger agentic systems
  • Refine tasks to improve determinism, clarity, difficulty, and scoring quality

Requirements

  • 5+ years of experience in data analysis
  • Strong proficiency in SQL and Python for data analysis and scripting (pandas, NumPy, or similar)
  • Experience working with real-world, messy datasets (CSV, JSON, logs, reports)
  • Ability to design non-trivial analytical questions with clear, specific, and verifiable answers
  • Solid understanding of statistical concepts (averages, distributions, outliers, correlations)
  • Familiarity with AI coding benchmark environments (e.g., SWE-bench, Terminal-Bench)
  • Comfortable working with Docker (writing Dockerfiles, building images, debugging containers)

Contract Details

  • Duration: 4 weeks (expected start: next week)
  • Commitment: 8 hours/day with a 4-hour overlap with PST
  • Type: Contractor position (does not include medical/paid leave benefits)

Why Work With Turing?

  • Contribute to cutting-edge AI projects with leading foundation model companies
  • Work on high-impact tasks at the frontier of LLM evaluation and reasoning
  • Fully remote with flexible collaboration across global teams

Go back

Related Jobs

Benture logo
See All Jobs