AI Quality Analyst (Personalization) – Hindi | $15/hr | Remote Worldwide
Join Turing's global AI evaluation team to assess and improve a cutting-edge personalization feature for Gemini. In this role, you will design creative conversational prompts, evaluate AI-generated responses for quality and accuracy, and provide structured feedback that directly shapes frontier AI development.
About Turing
Based in San Francisco, Turing is the world's leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing helps accelerate frontier research with high-quality data, advanced training pipelines, and top AI researchers specializing in coding, reasoning, STEM, multilinguality, and more.
Role Overview
As an AI Quality Analyst, you will evaluate how well Gemini uses personal context — including past conversations, Gmail, Google Search, and YouTube activity — to deliver relevant, helpful responses. You will assess responses across key dimensions including Grounding, Integration, and Helpfulness.
Key Responsibilities
- Design and execute multi-turn conversational prompts (1–5 turns) requiring the AI to leverage personal information and experiences.
- Evaluate model responses for appropriate personalization based on your original intent.
- Analyze responses for Grounding issues, ensuring claims are evidence-based and free from hallucinations.
- Assess Integration quality to confirm personal data is woven naturally without robotic overnarrating.
- Perform side-by-side (SxS) stack-ranking of two model responses for helpfulness, usability, and enjoyment.
- Write clear, defensible rationales for comparisons, referencing specific conversation turns.
- Extract and verify Debug Info to confirm proper use of chat summaries and data sources.
- Maintain data hygiene by deleting evaluation conversations to preserve chat history integrity.
Key Qualifications
- Hindi Proficiency: High-level reading and writing ability in Hindi is required for this project.
- Personal Google Account: Willingness to use your primary personal Google account and enable personal data sources for authentic evaluation.
- Analytical Thinking: Ability to evaluate nuanced, ambiguous AI responses with a focus on personalization quality.
- Prompt Engineering: Experience designing creative, multi-turn prompts grounded in personal context.
- Evaluation Acumen: Ability to identify incorrect personalization, poor inferences, and forced connections.
- Attention to Detail: Skilled at spotting subtle differences in naturalness and overnarrating across SxS responses.
- Written Communication: Ability to write clear, structured rationales with explicit references to conversation turns.
- Independence: Self-motivated and comfortable working remotely with minimal supervision.
- Technical Setup: Desktop or laptop with a reliable internet connection.
Education & Experience
- BS/BA degree or equivalent experience in Policy, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical field.
- Prior experience in data annotation, AI quality evaluation, or content moderation is strongly preferred.
Engagement Details
- Rate: $15/hour
- Type: Contractor
- Duration: 3 months
- Hours: Minimum 30 hrs/week (options: 30 or 40 hrs/week); at least 4 hours/day with 4-hour overlap with PST
Evaluation Process
- Shortlisted candidates receive a Job Interest Form.
- A timed assessment is shared and must be completed within 24 hours.
- Successful candidates are contacted to discuss pre-onboarding requirements.