Benture logo
 ←  next job →
Turing logo

LLM Trainer – Python & Linux Systems at Turing

posted 3 hours ago
turing.com Contractor Remote (select) TBD 31 views

LLM Trainer – Terminal-Bench | Contractor | Remote (India, Pakistan, Nigeria, Kenya, Egypt, Ghana, Bangladesh, Turkey, Mexico)

Turing is seeking skilled Terminal-Bench Task Authors to design, develop, and validate high-quality benchmark tasks for evaluating large language models (LLMs) in simulated Linux terminal environments. This is a short-term contractor role (1 month) requiring 8 hours/day with 4 hours of PST overlap.

About Turing

Based in San Francisco, Turing is the world's leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing specializes in high-quality data, advanced training pipelines, and top AI researchers across coding, reasoning, STEM, multilinguality, multimodality, and agents.

Role Overview

In this role, you will create challenging, deterministic, and reproducible benchmark tasks that rigorously test AI capabilities across software engineering, systems, data science, and mathematical domains. Your work will directly contribute to evaluating and stress-testing frontier AI models under real-world terminal constraints.

Day-to-Day Responsibilities

  • Author original Terminal-Bench 2.0 (Harbor) tasks with precise, unambiguous instructions
  • Design realistic Linux terminal workflows involving filesystems, processes, networking, and containers
  • Implement golden solutions and pytest-based evaluation scripts with deterministic outcomes
  • Build and maintain Dockerized environments with pinned dependencies for reproducibility
  • Anticipate edge cases and failure modes to prevent reward hacking
  • Validate tasks by running them against frontier LLMs and iterating to achieve target pass/fail rates
  • Deliver approximately 5 fully validated benchmark tasks per week

Required Technical Skills

  • Python: Strong proficiency with clean, testable code (3+ years)
  • Bash / Shell Scripting: Confident with Unix command-line tools and workflows (2+ years)
  • Linux Systems: Familiarity with filesystems, permissions, processes, and basic networking (1+ year)
  • Docker: Experience building, configuring, and debugging containerized environments

Domain Expertise (One or More Required)

  • Software Engineering, System Administration, or Debugging
  • Data Science, Machine Learning, or Model Training
  • Mathematics, Algorithm Design, or Scientific Computing
  • Frontend or Backend Development
  • Data Preprocessing and Analysis

Core Competencies

  • Strong analytical and problem-solving ability
  • Extreme attention to detail in technical writing
  • Ability to anticipate corner cases and unintended model behaviors
  • Comfort working with deterministic evaluation and strict correctness criteria

Contract Details

  • Duration: 1 month (expected start: next week)
  • Commitment: 8 hours/day with 4-hour PST overlap
  • Employment Type: Contractor (no medical/paid leave)
  • Eligible Locations: India, Pakistan, Nigeria, Kenya, Egypt, Ghana, Bangladesh, Turkey, Mexico

Perks

  • Fully remote work environment
  • Opportunity to work on cutting-edge AI projects with leading LLM companies

Go back

Related Jobs

Benture logo
See All Jobs