This job post has expired on June 25, 2026. It is likely that the position has already been filled.

AI/ML Engineer – LLM Post-Training at Turing

posted 1 month ago

turing.com Contractor IN/PK/BD/BR TBD 312 views

AI/ML Engineer – LLM Post-Training | Contractor | Remote (India, Pakistan, Bangladesh, Brazil)

Turing is seeking an experienced AI/ML Engineer specializing in LLM post-training and reinforcement learning to join a cutting-edge AI research accelerator. This is a 2-month contractor engagement requiring 40 hours per week with 4 hours of daily overlap with PST.

About Turing

Based in San Francisco, Turing is the world's leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing accelerates frontier research with high-quality data, advanced training pipelines, and top AI researchers — and helps enterprises transform AI from proof of concept into proprietary, production-grade intelligence.

Role Overview

This role focuses on fine-tuning open-weight models, building reward systems, and improving model performance through scalable training, evaluation, and data curation workflows.

Day-to-Day Responsibilities

Design and execute fine-tuning pipelines for open-weight models (Qwen, Llama, Mistral) using SFT → DPO → GRPO progressions on tool-use and agentic data.
Implement and tune LoRA / QLoRA adapters for parameter-efficient fine-tuning; determine when full fine-tuning vs. PEFT is appropriate.
Build reward functions and verifiers for RL training, including programmatic verifiers, LLM-as-judge rubrics, and state-transition checks against gym environments.
Generate, curate, and filter RL tool-use training data: golden trajectories, preference pairs, on-policy rollouts, and rejection-sampled completions.
Run distributed training on multi-GPU setups; manage inference at scale with vLLM, including extended-context configurations via YaRN / RoPE scaling.
Diagnose failure modes such as reward hacking, distribution collapse, KL blow-up, tool-selection errors, and format drift.
Define and track evaluation metrics (pass@k, pass^k, trajectory-level scoring, rubric-based vs. binary scoring) and own model-quality reporting against benchmarks.
Partner with annotation, eval, and client teams to translate data-quality signals into training improvements.

Requirements

3+ years of hands-on ML engineering experience, with at least 1+ year specifically in LLM post-training.
Demonstrated experience with at least three of: SFT, LoRA/QLoRA, DPO, PPO, GRPO, RLHF.
Strong PyTorch fundamentals; working familiarity with Hugging Face TRL, Accelerate, DeepSpeed or FSDP, and vLLM.
Experience designing reward signals or verifiers for RL training — not just running training scripts.
Solid understanding of tokenization, attention, chat templates, tool-calling formats (OpenAI/Anthropic-style), and common agent training failure modes.
Proficiency in Python, distributed training, GPU profiling, and translating research papers into working code.

Strongly Preferred

Experience training tool-use or agentic models (function calling, multi-step tool selection, planner-executor patterns).
Experience with synthetic data generation pipelines and rejection sampling.
Familiarity with MCP, LangChain/LangGraph, or similar agent frameworks.
Experience building eval harnesses, designing rubrics, and handling judge variance and reward hacking at scale.
Cloud/infra experience: RunPod, AWS, GCP; container workflows; long-context inference tuning.

Engagement Details

Commitment: 40 hours/week with 4-hour PST overlap required
Type: Contractor (no medical/paid leave benefits)
Duration: 2 months, starting next week
Eligible Locations: India, Pakistan, Bangladesh, Brazil
Evaluation: 2 rounds of Technical Interviews (90 minutes each)

Why Work With Turing?

Fully remote, flexible work environment.
Opportunity to contribute to frontier AI research alongside leading LLM companies.

Go back

Show all jobs of Turing

AI/ML Engineer – LLM Post-Training at Turing

Related Jobs

Turing

Varies remote

Turing

TBD remote

Turing

TBD remote in UK

Turing

Varies remote

Turing

TBD remote

Turing

Varies remote

Turing

TBD remote

Turing

TBD Remote (select)

Turing

Varies remote

Turing

TBD remote

Turing

Varies remote

Turing

Varies remote