Benture logo
 ←  next job →

AI Quality Analyst (Personalization) at Turing

posted 1 hour ago
turing.com Contractor remote 15/hr 24 views

AI Quality Analyst (Personalization) – Spanish | $15/hr | Remote | 3-Month Contract

Turing is seeking a detail-oriented and analytically sharp AI Quality Analyst to evaluate a cutting-edge personalization feature for Gemini. In this role, you will assess how well the AI model leverages personal data—including past conversations, Gmail, Google Search, and YouTube activity—to deliver relevant, helpful responses. Spanish proficiency is required, as it is the primary language for this project.

Based in San Francisco, Turing is the world's leading research accelerator for frontier AI labs, partnering with global enterprises to deploy advanced, reliable AI systems.

Key Responsibilities

  • Design and execute multi-turn conversational prompts (1–5 turns) that require the AI to draw on your personal information and experiences.
  • Evaluate model responses for appropriate personalization based on your original intent.
  • Analyze responses for Grounding issues, ensuring claims are evidence-based and free from hallucinations or flawed inferences.
  • Assess Integration quality to confirm personal data is woven naturally into responses without robotic overnarrating.
  • Conduct rigorous side-by-side (SxS) comparisons of two model responses, ranking them on helpfulness, usability, and overall quality.
  • Write clear, defensible rationales for your evaluations, referencing specific conversation turns.
  • Extract and verify debug information to confirm proper use of chat summaries and data sources.
  • Maintain strict data hygiene by deleting evaluation conversations after each session.

Qualifications

  • Spanish Proficiency: High-level reading and writing ability in Spanish is required.
  • Personal Google Account: Willingness to use your primary personal Google account and enable relevant personal data sources.
  • Analytical Thinking: Demonstrated ability to evaluate nuanced, ambiguous AI responses with precision.
  • Prompt Engineering: Experience designing creative, context-rich, multi-turn prompts to thoroughly test model capabilities.
  • Evaluation Acumen: Strong understanding of personalization concepts, including identifying poor inferences and forced connections.
  • Attention to Detail: Ability to spot subtle differences in naturalness and response quality across SxS comparisons.
  • Written Communication: Excellent ability to write concise, structured, and well-reasoned evaluation rationales.
  • Independence: Self-motivated with the ability to work effectively in a fully remote environment.
  • Technical Setup: Reliable desktop or laptop with a strong internet connection.

Education & Experience

  • BS/BA degree or equivalent experience in Policy, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical field.
  • Prior experience in data annotation, AI quality evaluation, or content moderation is strongly preferred.

Engagement Details

  • Rate: $15/hour
  • Type: Contractor
  • Duration: 3 months
  • Commitment: Minimum 30 hours/week (30 or 40 hrs/week options available), with at least 4 hours of daily overlap with PST.

Evaluation Process

  1. Shortlisted candidates will receive a Job Interest Form.
  2. A timed assessment will be shared and must be completed within 24 hours.
  3. Successful candidates will be contacted to discuss pre-onboarding requirements.

Go back

Related Jobs

Benture logo
See All Jobs