Benture logo
next job →
Turing logo

AI/ML Engineer – LLM Post-Training at Turing

posted 1 hour ago
turing.com Contractor IN/PK/BD/BR TBD 34 views

AI/ML Engineer – LLM Post-Training | Contractor | Remote (India, Pakistan, Bangladesh, Brazil)

Turing is seeking an experienced AI/ML Engineer specializing in LLM post-training and reinforcement learning to join a cutting-edge AI research accelerator. This is a 2-month contractor engagement requiring 40 hours per week with 4 hours of daily overlap with PST.

About Turing

Based in San Francisco, Turing is the world's leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing accelerates frontier research with high-quality data, advanced training pipelines, and top AI researchers — and helps enterprises transform AI from proof of concept into proprietary, production-grade intelligence.

Role Overview

This role focuses on fine-tuning open-weight models, building reward systems, and improving model performance through scalable training, evaluation, and data curation workflows.

Day-to-Day Responsibilities

  • Design and execute fine-tuning pipelines for open-weight models (Qwen, Llama, Mistral) using SFT → DPO → GRPO progressions on tool-use and agentic data.
  • Implement and tune LoRA / QLoRA adapters for parameter-efficient fine-tuning; determine when full fine-tuning vs. PEFT is appropriate.
  • Build reward functions and verifiers for RL training, including programmatic verifiers, LLM-as-judge rubrics, and state-transition checks against gym environments.
  • Generate, curate, and filter RL tool-use training data: golden trajectories, preference pairs, on-policy rollouts, and rejection-sampled completions.
  • Run distributed training on multi-GPU setups; manage inference at scale with vLLM, including extended-context configurations via YaRN / RoPE scaling.
  • Diagnose failure modes such as reward hacking, distribution collapse, KL blow-up, tool-selection errors, and format drift.
  • Define and track evaluation metrics (pass@k, pass^k, trajectory-level scoring, rubric-based vs. binary scoring) and own model-quality reporting against benchmarks.
  • Partner with annotation, eval, and client teams to translate data-quality signals into training improvements.

Requirements

  • 3+ years of hands-on ML engineering experience, with at least 1+ year specifically in LLM post-training.
  • Demonstrated experience with at least three of: SFT, LoRA/QLoRA, DPO, PPO, GRPO, RLHF.
  • Strong PyTorch fundamentals; working familiarity with Hugging Face TRL, Accelerate, DeepSpeed or FSDP, and vLLM.
  • Experience designing reward signals or verifiers for RL training — not just running training scripts.
  • Solid understanding of tokenization, attention, chat templates, tool-calling formats (OpenAI/Anthropic-style), and common agent training failure modes.
  • Proficiency in Python, distributed training, GPU profiling, and translating research papers into working code.

Strongly Preferred

  • Experience training tool-use or agentic models (function calling, multi-step tool selection, planner-executor patterns).
  • Experience with synthetic data generation pipelines and rejection sampling.
  • Familiarity with MCP, LangChain/LangGraph, or similar agent frameworks.
  • Experience building eval harnesses, designing rubrics, and handling judge variance and reward hacking at scale.
  • Cloud/infra experience: RunPod, AWS, GCP; container workflows; long-context inference tuning.

Engagement Details

  • Commitment: 40 hours/week with 4-hour PST overlap required
  • Type: Contractor (no medical/paid leave benefits)
  • Duration: 2 months, starting next week
  • Eligible Locations: India, Pakistan, Bangladesh, Brazil
  • Evaluation: 2 rounds of Technical Interviews (90 minutes each)

Why Work With Turing?

  • Fully remote, flexible work environment.
  • Opportunity to contribute to frontier AI research alongside leading LLM companies.

Go back

Related Jobs

Benture logo
See All Jobs