Benture logo
 ←  next job →

Engineering PhD - AI Model Evaluation at Mercor

posted 20 hours ago
mercor.com Contractor remote $73.29/hr 56 views

Engineering PhD - AI Model Evaluation | $73.29/hr | Worldwide Remote

Mercor is seeking PhD-level engineers to evaluate and improve conversational AI systems used in engineering contexts. Apply your technical expertise to ensure AI models deliver accurate, rigorous, and clear explanations of complex engineering concepts.

Why This Role Exists

Mercor partners with leading AI teams to enhance the quality and reliability of general-purpose conversational AI systems. In engineering contexts, these systems must demonstrate accurate applied reasoning, quantitative precision, and practical problem-solving. This project focuses on evaluating how models reason about and explain engineering concepts across multiple disciplines.

What You'll Do

  • Write and refine prompts to guide model behavior in engineering scenarios
  • Evaluate LLM-generated responses for technical accuracy, applied reasoning, and completeness
  • Conduct fact-checking and verify technical claims using authoritative sources and domain knowledge
  • Annotate model responses by identifying strengths, areas of improvement, and inaccuracies
  • Assess clarity, structure, and appropriateness of explanations for different audiences
  • Ensure model responses align with expected conversational behavior and system guidelines
  • Apply consistent evaluation standards using clear taxonomies, benchmarks, and detailed guidelines

Who You Are

  • Hold a PhD in Engineering or a closely related field
  • Deep expertise in one or more sub-domains: Mechanical & Physical Systems, Electrical & Computer, Chemical & Materials, or Civil & Environmental Engineering
  • Significant experience using large language models (LLMs) and understand their practical applications
  • Excellent writing skills with ability to clearly explain complex engineering concepts
  • Strong attention to detail and consistently notice subtle issues
  • Experience reviewing or editing technical or academic writing
  • Fluent in English

Nice-to-Have Specialties

  • Experience with applied research, industry engineering workflows, or systems design
  • Prior experience with RLHF, model evaluation, or data annotation work
  • Experience teaching or explaining engineering concepts to non-expert audiences
  • Familiarity with evaluation rubrics, benchmarks, or structured review frameworks

What Success Looks Like

  • Identify technical inaccuracies, flawed assumptions, or incomplete reasoning in model outputs
  • Deliver feedback that improves the rigor, clarity, and correctness of AI explanations
  • Produce consistent, reproducible evaluation artifacts that strengthen model performance
  • Contribute to building AI systems that professionals trust in engineering contexts

Why Join Mercor

Apply your PhD-level engineering expertise to improve how AI systems reason about and communicate complex technical concepts. This flexible, remote role enables you to contribute directly to the development of reliable, high-quality AI systems used in real-world applications.

Benture is an independent job board and is not affiliated with or employed by Mercor.

Tips for Applying to Mercor Jobs from Benture

Increase your chances of success!
1
Four Simple Steps

Upload resumeAI interviewComplete formSubmit application

2
Perfect Your Resume

Upload your best, up-to-date resume in English. Mercor will extract details and fill out your profile automatically. Review and adjust as needed.

3
Complete = Win

SHOCKING FACT: Only ~20% of applicants complete their application! Take the 15-minute AI interview about your experience and you'll have MUCH HIGHER chances of getting hired!

AI Interview Tips: The interview focuses on your resume and work experience. Be ready to discuss specific projects and how you solved challenges.

Takes about 15 minutes | Dramatically improves your chances

Related Jobs

Benture logo
See All Jobs