This job post has expired on August 31, 2025. It is likely that the position has already been filled.

Hourly Contract | Remote | $21/hour
Mercor is looking for detail-oriented and enthusiastic Audio Model Trainers to join an innovative AI research project. Your role involves recording concise, high-quality audio descriptions of visual content to enhance multimodal AI datasets, supporting the creation of next-generation models that understand both auditory and visual inputs.
View images and create clear, natural spoken descriptions.
Record audio clips (approximately 2-3 minutes each) using provided tools.
Ensure high-quality recordings free from background noise or distortion.
Adhere strictly to provided linguistic, timing, and style guidelines.
Collaborate closely with researchers and quality assurance teams to maintain data quality.
Excellent verbal communication and clear enunciation.
Native or near-native fluency in English (additional languages a plus).
Strong attention to detail and adherence to precise guidelines.
Previous voice recording or annotation experience is beneficial but not required.
Ability to work independently and consistently handle repetitive tasks.
Opportunity to participate in cutting-edge AI research with a leading lab.
Experience in the integration of language, audio, and visual AI systems.
Fully remote, flexible schedule.
Short AI-led interview (~15 minutes) and brief availability form.
Quick response: typically within one week of application.
Mercor, headquartered in San Francisco, CA, connects specialized talent with leading AI research initiatives. Investors include Benchmark, General Catalyst, Peter Thiel, Adam D’Angelo, Larry Summers, and Jack Dorsey. Mercor supports inclusivity and provides reasonable accommodations upon request.
Apply today to contribute directly to pioneering advancements in multimodal AI!