Prime Intellect
Position: Research Engineer - Distributed Training
Location: San Francisco or Remote
Type: Full-time
Department: Engineering
About Us
At Prime Intellect, we are on a mission to drive open, decentralized AI progress by enabling anyone to contribute compute, code, or capital to train powerful, open models. Our ultimate vision? Creating AGI that’s accessible to everyone. But we can't do it alone—we need passionate innovators like you to help make it a reality.
We are building the infrastructure for decentralized AI development on a global scale. By aggregating compute power from across the world, we enable researchers to collaboratively train state-of-the-art models through distributed training across clusters.
The Role
As a Research Engineer - Distributed Training, you will be at the forefront of shaping the future of decentralized AI training. Your focus will be on building and optimizing our distributed AI training stack to scale efficiently and reliably. If you are passionate about scaling systems and improving the efficiency of training large models, we’d love to have you on our team.
Key Responsibilities
- Lead and conduct cutting-edge research to build a massive-scale, highly secure decentralized training orchestration system.
- Optimize the performance, cost-efficiency, and resource utilization of AI workloads using the latest compute and memory optimization techniques.
- Contribute to the development of our open-source libraries and frameworks for distributed model training.
- Present research at top-tier AI conferences like ICML and NeurIPS.
- Simplify complex technical outcomes into accessible, user-friendly content, including technical blogs for our customers and developer community.
- Stay current with the latest advancements in AI/ML infrastructure, decentralized training, and identify opportunities to enhance our platform.
What We’re Looking For
- Strong experience in AI/ML engineering with a proven track record of designing and deploying large-scale AI model pipelines.
- Deep expertise in distributed training techniques and frameworks (e.g., PyTorch Distributed, DeepSpeed, MosaicML’s LLM Foundry) and tools (e.g., Ray) to optimize AI workload performance.
- Familiarity with large-scale model training, including data, tensor, and pipeline parallelism.
- Solid knowledge of MLOps best practices such as model versioning, experiment tracking, and CI/CD pipelines.
- Passion for advancing decentralized AI model training and democratizing access to AI capabilities for researchers, developers, and businesses worldwide.
Don’t meet all the criteria? If you're passionate about our mission and willing to learn, we encourage you to apply! We’d be excited to hear how you can contribute.
Why Join Us?
- Competitive compensation with equity and token incentives, aligning your success with the company’s growth.
- Flexible work arrangements—work remotely or in our San Francisco office.
- Visa sponsorship and relocation assistance for international candidates.
- Regular team off-sites, hackathons, and conference opportunities.
- Be part of a talented, mission-driven team united by a passion for leveraging technology to accelerate AI and scientific progress.
We recently raised a $5.5 million seed round from top investors, including Clem from Hugging Face and Dylan Patel from SemiAnalysis.
If you’re excited about shaping the future of decentralized AI and building a platform that empowers researchers and developers to push boundaries, we want to hear from you!