Benture logo

This job post has expired on February 08, 2026. It is likely that the position has already been filled.

Mercor logo

Generalist - English & Italian at Mercor

posted 2 months ago
mercor.com Contractor remote: US/Europe $36/hour 258 views

AI Language Evaluator | $36.16/hr | Remote (US & Europe)

Mercor is seeking bilingual English and Italian speakers to evaluate and improve conversational AI systems used by millions worldwide. This flexible contract role puts you at the forefront of human-in-the-loop AI development, directly shaping how advanced language models communicate.

Why This Role Exists

We partner with leading AI teams to enhance the quality, accuracy, and reliability of large language models (LLMs). Your expertise will ensure these systems respond clearly, accurately, and helpfully across diverse real-world scenarios.

What You'll Do

  • Evaluate LLM-generated responses for accuracy, clarity, and effectiveness
  • Conduct fact-checking using trusted public sources and external tools
  • Provide high-quality human feedback by annotating response strengths and weaknesses
  • Assess reasoning quality, tone, completeness, and conversational alignment
  • Apply consistent annotations following detailed evaluation guidelines and taxonomies
  • Identify factual inaccuracies, reasoning errors, and communication gaps

Who You Are

  • Hold a Bachelor's degree
  • Native speaker or C2-level fluency in Italian
  • Significant experience using large language models and understanding user behavior
  • Excellent writing skills with ability to articulate nuanced feedback
  • Strong attention to detail and ability to notice subtle issues
  • Adaptable across diverse topics, domains, and requirements
  • Background in structured analytical thinking (research, policy, analytics, linguistics, engineering)
  • Excellent college-level mathematics skills

Nice-to-Have Specialties

  • Prior experience with RLHF, model evaluation, or data annotation
  • Experience writing or editing high-quality content
  • Experience making fine-grained qualitative judgments between multiple outputs
  • Familiarity with evaluation rubrics, benchmarks, or quality scoring systems

What Success Looks Like

You'll produce clear, consistent evaluation artifacts that lead to measurable improvements in AI response quality. Your work will directly impact user experience and help ensure AI systems meet the highest standards before public release.

Work Arrangement

This is a flexible, remote contract position available to candidates in the United States and Europe. Choose full-time or part-time hours that fit your schedule while contributing meaningfully to AI systems shaping the future of human-computer interaction.

How to apply for this role
  • Upload your resume — keep it up-to-date and in English. Mercor will auto-fill your profile from it.
  • Complete the AI interview — a 15-minute conversation about your experience. Be ready to discuss specific projects and challenges you've solved.
  • Submit your application — only about 20% of applicants finish all the steps, so completing yours puts you well ahead.
Benture is an independent job board and is not affiliated with Mercor.

Related Jobs

Benture logo
See All Jobs