This job post has expired on May 29, 2026. It is likely that the position has already been filled.

STEM Scientific Software Evaluation Design at Mercor

posted 2 months ago

mercor.com Contractor remote 45-100/hr 337 views

STEM Scientific Software & Evaluation Design | $45–$100/hr | Worldwide Remote

Join a cutting-edge project building large-scale evaluation benchmarks for advanced AI reasoning across scientific and engineering domains. As a Task Designer, you'll create graduate-level computational problems that challenge AI systems to use real scientific software tools — from querying simulations and interpreting outputs to designing experimental strategies and recovering hidden information from data.

This is not a typical annotation or labeling role. You'll be crafting original, research-grade problems, calibrating them against frontier AI models, and iterating until the difficulty hits the right target.

What You'll Do

Design sophisticated computational problems requiring domain-specific scientific software libraries
Create tasks that test precise multi-step scientific workflows as well as strategic experimental reasoning
Participate in a calibration loop — testing problems against state-of-the-art AI models and refining designs accordingly
Write problem setups, oracle functions, and solution validators in Python

Domains & Tools

Bioinformatics & Single-Cell Genomics: scanpy, scvelo, squidpy, gudhi — RNA-seq, trajectory inference, spatial transcriptomics
Computational Chemistry & Electronic Structure: PySCF — Hartree-Fock, DFT, TDDFT, CASSCF, post-HF methods
Particle & Nuclear Physics: scikit-hep — HEP data analysis, cross-sections, perturbative QCD, Monte Carlo
Electrical Engineering & RF/Circuit Design: scikit-rf, ngspice — S-parameters, transmission-line modeling, circuit simulation
Astrophysics & Cosmology: astropy — cosmological calculations, angular power spectra, observational pipelines
Structural & Mechanical Engineering: scikit-fem — finite element analysis, beam theory, elasticity problems
Seismology & Geophysics: ObsPy, SPECFEM — waveform analysis, tomography, moment tensor inversion
Pharmacokinetics & Systems Biology: libRoadRunner, Tellurium, SBML — PK/PD modeling, enzyme kinetics

Requirements

Graduate-level training in a relevant STEM domain (MS, PhD, or equivalent research experience)
Demonstrated hands-on proficiency with at least one listed scientific software library
Strong Python programming skills
Ability to work independently and iterate based on calibration feedback
Comfortable in a Linux/terminal environment with remote compute sandboxes
Available for at least 15–20 hours per week

Nice to Have

Experience across multiple listed domains or tools
Background in benchmark, evaluation, or exam/problem-set design
Familiarity with computational reproducibility and containerized environments

Strong candidates think like puzzle designers — building problems where difficulty stems from reasoning strategy, not brute computation, and where surface-level pattern matching won't suffice.

Apply on Mercor Go back

Show all jobs of Mercor

How to apply for this role

Upload your resume — keep it up-to-date and in English. Mercor will auto-fill your profile from it.
Complete the AI interview — a 15-minute conversation about your experience. Be ready to discuss specific projects and challenges you've solved.
Submit your application — only about 20% of applicants finish all the steps, so completing yours puts you well ahead.

Benture is an independent job board and is not affiliated with Mercor.

STEM Scientific Software Evaluation Design at Mercor

What You'll Do

Domains & Tools

Requirements

Nice to Have

How to apply for this role

Related Jobs

Mercor

100-130/h remote

Mercor

55-80/hr remote

Mercor

95-125/hr remote

Mercor

110-140/h remote

Mercor

120-140/h remote in US

Mercor

$50/hr remote in US

Mercor

70-84/hr remote

Mercor

$60-70/hr remote

Mercor

70-110/hr remote in US

Mercor

70-110/hr remote in US

Mercor

$8-12/hr remote in Sri Lanka

Mercor

$50/hr Remote in ID