Benture logo
next job
Mercor logo

Computational Chemistry AI Evaluator at Mercor

posted 2 hours ago
mercor.com Contractor remote 70-100/hr 33 views

Computational Chemistry AI Evaluator | $70–100/hr | Worldwide Remote

Join a cutting-edge project building large-scale evaluation benchmarks for advanced AI reasoning in scientific and engineering domains. As a task designer, you'll create graduate-level computational problems that challenge AI systems to use real scientific software tools — from running simulations and interpreting outputs to designing experimental strategies and recovering hidden information from data.

This is not a typical annotation or labeling role. You'll be crafting original, research-grade problems grounded in real scientific workflows, calibrating them against frontier AI models, and iterating until the difficulty is precisely right.

What You'll Do

  • Design sophisticated computational problems requiring domain-specific scientific software libraries
  • Create tasks that test multi-step scientific workflows and strategic experimental reasoning
  • Participate in a calibration loop — testing problems against state-of-the-art AI models and refining difficulty
  • Write problem setups, oracle functions, and solution validators in Python

Domain Focus: Computational Chemistry & Electronic Structure

We're seeking experts with deep, hands-on experience using PySCF for quantum chemistry calculations, including Hartree-Fock, DFT, TDDFT, CASSCF, and post-HF methods. Ideal candidates can design problems around excited-state analysis, orbital diagnostics, method selection for complex electronic structures, and interpreting computational artifacts from method limitations. Experience with other specialized software in this domain is also considered.

Requirements

  • Graduate-level training in a relevant STEM domain (MS, PhD, or equivalent research experience)
  • Demonstrated proficiency with at least one listed scientific software library (evidenced by publications, open-source contributions, or professional work)
  • Strong Python programming skills
  • Ability to work independently and iterate based on calibration feedback
  • Comfortable in a Linux/terminal environment with remote compute sandboxes
  • Available for at least 15–20 hours per week

Nice to Have

  • Experience across multiple scientific domains or tools
  • Familiarity with benchmark or evaluation design
  • Background in scientific pedagogy or problem-set design
  • Experience with computational reproducibility and containerized environments

Strong candidates will think like puzzle designers — constructing problems where difficulty stems from reasoning strategy, not brute computation, and where surface-level pattern matching won't suffice.

How to apply for this role
  • Upload your resume — keep it up-to-date and in English. Mercor will auto-fill your profile from it.
  • Complete the AI interview — a 15-minute conversation about your experience. Be ready to discuss specific projects and challenges you've solved.
  • Submit your application — only about 20% of applicants finish all the steps, so completing yours puts you well ahead.
Benture is an independent job board and is not affiliated with Mercor.

Related Jobs

Benture logo
See All Jobs
Apply Back