Software Engineering & Data Science Expert | $60–100/hr | Worldwide Remote
Mercor is seeking experienced software engineers and data scientists to evaluate and improve conversational AI systems used by developers worldwide. This role involves assessing LLM-generated code responses, conducting accuracy testing, and ensuring AI systems deliver reliable, high-quality technical solutions.
Why This Role Exists
Mercor partners with leading AI teams to enhance the quality and reliability of general-purpose conversational AI systems. In coding contexts, these systems must demonstrate correct reasoning, strong problem-solving ability, and adherence to real-world engineering best practices. Your expertise will directly shape how AI systems reason about and generate code.
What You'll Do
- Evaluate LLM-generated responses to coding and software engineering queries for accuracy, reasoning, clarity, and completeness
- Conduct fact-checking using trusted public sources and authoritative references
- Execute code and validate outputs using appropriate testing tools
- Annotate model responses by identifying strengths, areas of improvement, and factual or conceptual inaccuracies
- Assess code quality, readability, algorithmic soundness, and explanation quality
- Ensure model responses align with expected conversational behavior and system guidelines
- Apply consistent evaluation standards by following clear taxonomies, benchmarks, and detailed guidelines
Who You Are
- Hold a BS, MS, or PhD in Computer Science or a closely related field
- Have 5+ years of real-world experience in software engineering or related technical roles
- Expert in at least two relevant programming languages (e.g., Python, Java, C++, C, JavaScript, Go, Rust, Ruby, SQL, PowerShell, Bash, Swift, Kotlin, R, TypeScript, HTML/CSS)
- Able to solve HackerRank or LeetCode Medium and Hard-level problems independently
- Have experience contributing to well-known open-source projects, including merged pull requests
- Have significant experience using LLMs while coding and understand their strengths and failure modes
- Strong attention to detail and comfortable evaluating complex technical reasoning
- Fluent in English
Nice-to-Have Specialties
- Prior experience with RLHF, model evaluation, or data annotation work
- Track record in competitive programming
- Experience reviewing code in production environments
- Familiarity with multiple programming paradigms or ecosystems
- Experience explaining complex technical concepts to non-expert audiences
What Success Looks Like
You identify incorrect logic, inefficiencies, edge cases, or misleading explanations in model-generated code. Your feedback improves the correctness, robustness, and clarity of AI coding outputs, delivering reproducible evaluation artifacts that strengthen model performance and help developers trust AI systems for real-world coding tasks.
Work Arrangement
This is a remote contract position available for both US-based and international candidates. Full-time or part-time arrangements are available based on your availability.