Benture logo
 ←  next job →
Turing logo

Agentic Tasker (Frontier STEM) at Turing

posted 1 hour ago
turing.com Contractor remote: US, EU ~$80/hr 37 views

Agentic Tasker (Frontier STEM) | ~$80/hr | Remote – North America & Europe

Work directly with researchers at a top-tier Frontier AI Lab to enhance the reasoning and problem-solving capabilities of cutting-edge AI models. This role focuses on designing, validating, and analyzing challenging STEM benchmark tasks to push the boundaries of frontier model performance in data science, ML, and related fields.

Key Responsibilities

  • Task Design & Development: Create challenging, real-world data science problems that serve as the foundation for Colab Bench tasks.
  • Content Generation: Integrate problems into an Agentic development environment using Python, including:
    • Detailed task instructions and overviews
    • Golden solutions that adhere to task specifications
    • Complete environments with datasets, Python libraries, and metadata
    • Test notebooks containing unit tests that solutions must pass
  • Evaluation & Analysis: Assess cross-model performance on designed tasks and identify areas for improvement.
  • Headroom Identification: Pinpoint tasks where the target model fails, specifically classifying failures as logical reasoning issues.
  • Loss Extraction: Analyze agent trajectories to identify and extract core capability loss patterns from the model.

Qualifications

  • Strong expertise in data science, machine learning, finance, and coding with a deep background in frontier STEM disciplines.
  • Actively recruiting PhD students from top US institutions and highly skilled GitHub contributors. (A small cohort in India will also be considered.)

Offer Details

  • Rate: ~$80/hour
  • Commitment: Minimum 30 hours/week on weekdays
  • Employment Type: Contractor (no medical or paid leave benefits)
  • Duration: 3 months, with an expected start date of next week
  • Locations: North America and Europe

Go back

Related Jobs

Benture logo
See All Jobs