Benture logo
 ←  next job →
Turing logo

SwarmBench Task Engineer – Math at Turing

posted 3 hours ago
turing.com Contractor remote TBD 32 views

SwarmBench Task Engineer (Reasoning/Math) | Contractor | Fully Remote | ~40 hrs/week

Turing is seeking a highly analytical and computationally proficient SwarmBench Task Engineer specializing in mathematical reasoning. In this role, you will design and build challenging multi-agent benchmark tasks that push the boundaries of AI reasoning — spanning competition math, numerical analysis, combinatorial optimization, and formal proof construction.

About Turing

Based in San Francisco, Turing is the world's leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing accelerates frontier research with high-quality data, advanced training pipelines, and top AI researchers specializing in coding, reasoning, STEM, multilinguality, multimodality, and agents.

Key Responsibilities

  • Build multi-agent benchmark tasks requiring multi-step mathematical reasoning, proof construction, or algorithmic problem-solving.
  • Design problems that are genuinely difficult for a single agent but decomposable — including competition math, numerical analysis, combinatorial optimization, and statistical inference.
  • Create verification scripts that check mathematical correctness — numerical answers with appropriate tolerance, proof step validity, and algorithm output correctness.
  • Write clear, precise problem statements with well-defined notation, definitions, and output formats.
  • Develop decomposition guides that split problems into independent sub-computations or parallel solution strategies.

Required Qualifications

  • 5+ years in mathematics, quantitative research, or computational science (competition math, university-level mathematics, or quantitative research background).
  • Proficiency in Python — NumPy, SciPy, or symbolic computation (SymPy).
  • Experience writing mathematical proofs or formal derivations.
  • Ability to create problems with precise, verifiable answers.
  • Experience with AI coding benchmarks (e.g., SWE-bench, Terminal-bench).
  • Comfortable with Docker — writing Dockerfiles, building images, and debugging containers.
  • Strong understanding of numerical methods — floating point tolerance, convergence criteria, and error bounds.

Strong Plusses

  • Experience creating math competition problems (AMC, AIME, Putnam, IMO, or similar).
  • Research background in mathematics, theoretical CS, or quantitative fields.
  • Experience with automated theorem proving or formal verification.
  • Knowledge of AI reasoning benchmarks (GSM8K, MATH, AIME, GPQA, ARC-AGI).
  • Experience with large-scale numerical computation or scientific computing.

Engagement Details

  • Commitment: 40 hours/week with 4 hours of PST overlap required.
  • Type: Contractor/Freelancer (no medical or paid leave benefits).
  • Duration: 1-month contract with expected start next week; potential for extension based on performance.

Perks

  • Fully remote work environment.
  • Opportunity to contribute to cutting-edge AI research with leading LLM companies.
  • Potential contract extension based on performance and project needs.

Go back

Related Jobs

Benture logo
See All Jobs