This job post has expired on February 07, 2026. It is likely that the position has already been filled.

Engineering PhD - AI Model Evaluation at Mercor

posted 6 months ago

mercor.com Contractor remote $73.29/hr 550 views

Engineering PhD - AI Model Evaluation | $73.29/hr | Worldwide Remote

Mercor is seeking PhD-level engineers to evaluate and improve conversational AI systems used in engineering contexts. Apply your technical expertise to ensure AI models deliver accurate, rigorous, and clear explanations of complex engineering concepts.

Why This Role Exists

Mercor partners with leading AI teams to enhance the quality and reliability of general-purpose conversational AI systems. In engineering contexts, these systems must demonstrate accurate applied reasoning, quantitative precision, and practical problem-solving. This project focuses on evaluating how models reason about and explain engineering concepts across multiple disciplines.

What You'll Do

Write and refine prompts to guide model behavior in engineering scenarios
Evaluate LLM-generated responses for technical accuracy, applied reasoning, and completeness
Conduct fact-checking and verify technical claims using authoritative sources and domain knowledge
Annotate model responses by identifying strengths, areas of improvement, and inaccuracies
Assess clarity, structure, and appropriateness of explanations for different audiences
Ensure model responses align with expected conversational behavior and system guidelines
Apply consistent evaluation standards using clear taxonomies, benchmarks, and detailed guidelines

Who You Are

Hold a PhD in Engineering or a closely related field
Deep expertise in one or more sub-domains: Mechanical & Physical Systems, Electrical & Computer, Chemical & Materials, or Civil & Environmental Engineering
Significant experience using large language models (LLMs) and understand their practical applications
Excellent writing skills with ability to clearly explain complex engineering concepts
Strong attention to detail and consistently notice subtle issues
Experience reviewing or editing technical or academic writing
Fluent in English

Nice-to-Have Specialties

Experience with applied research, industry engineering workflows, or systems design
Prior experience with RLHF, model evaluation, or data annotation work
Experience teaching or explaining engineering concepts to non-expert audiences
Familiarity with evaluation rubrics, benchmarks, or structured review frameworks

What Success Looks Like

Identify technical inaccuracies, flawed assumptions, or incomplete reasoning in model outputs
Deliver feedback that improves the rigor, clarity, and correctness of AI explanations
Produce consistent, reproducible evaluation artifacts that strengthen model performance
Contribute to building AI systems that professionals trust in engineering contexts

Why Join Mercor

Apply your PhD-level engineering expertise to improve how AI systems reason about and communicate complex technical concepts. This flexible, remote role enables you to contribute directly to the development of reliable, high-quality AI systems used in real-world applications.

Apply on Mercor Go back

Show all jobs of Mercor

How to apply for this role

Upload your resume — keep it up-to-date and in English. Mercor will auto-fill your profile from it.
Complete the AI interview — a 15-minute conversation about your experience. Be ready to discuss specific projects and challenges you've solved.
Submit your application — only about 20% of applicants finish all the steps, so completing yours puts you well ahead.

Benture is an independent job board and is not affiliated with Mercor.

Engineering PhD - AI Model Evaluation at Mercor

How to apply for this role

Related Jobs

Mercor

$170/hr remote in US

Mercor

$70/hr remote

Mercor

$180/hr remote in US

Mercor

$150/hr remote in US

Mercor

$120/hr remote in US

Mercor

$50-70/hr remote

Mercor

$210/hr remote

Mercor

$30/hr Remote

Mercor

$38/hr remote

Mercor

$38/hr remote in Germany

Mercor

$38/hr remote

Mercor

$30/hr remote