This job has expired
This position was posted on December 3, 2025 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Job Overview
Location
Cambridge
Job Type
Full-time
Category
Software Engineering
Date Posted
December 3, 2025
Full Job Description
đź“‹ Description
- • Architect the beating heart of CuspAI’s research engine. You will design, build and continuously evolve a cloud-native ML platform on Google Cloud Platform and Kubernetes that lets our world-leading AI chemists, physicists and materials scientists spin up massive distributed training jobs, track thousands of experiments and ship models to production—all without ever touching a YAML file.
- • Own the full MLOps lifecycle end-to-end. From source-controlled infrastructure-as-code (Terraform, Helm, Kapitan) through CI/CD, model registries, experiment tracking (MLflow, Weights & Biases or similar) and automated deployment, you will be the single source of truth for how code becomes a running, monitored, cost-optimised service.
- • Enable planet-scale distributed training. You will provision and tune multi-node GPU clusters (A100s, H100s, TPU pods) with smart checkpointing, elastic resource scaling and fault-tolerant data pipelines so that a 10-billion-parameter model can train overnight and resume gracefully if a node fails.
- • Guarantee 99.9 % uptime for the platform that powers breakthrough discoveries. Build real-time observability (Prometheus, Grafana, Alertmanager), self-healing automation and on-call playbooks so researchers sleep well while GPUs churn through exaflops of computation.
- • Optimise every dollar of cloud spend. Implement quota management, spot-instance orchestration and workload-aware bin-packing so that we can run 30 % more experiments without increasing budget—freeing cash for even bigger clusters.
- • Craft a delightful developer experience. Create opinionated SDKs, CLI tools and JupyterHub templates that abstract Kubernetes complexity, letting a chemist type `cuspai train --dataset water-filtration --gpus 64` and watch the magic happen.
- • Champion GitOps and reproducibility. Every environment—from a researcher’s laptop to production—is declared in Git, reviewed like code and rolled out automatically, ensuring that yesterday’s breakthrough can be reproduced next year.
- • Collaborate across disciplines daily. Sit shoulder-to-shoulder with ML researchers debugging convergence issues, pair with chemists optimising molecular featurisation pipelines, and sync with software engineers integrating models into customer-facing APIs.
- • Shape the strategic roadmap. As the first ML Infrastructure hire, you will define standards, pick the next tools and mentor future teammates, leaving a lasting architectural imprint on a platform that could accelerate the discovery of carbon-capture membranes, room-temperature superconductors or next-gen batteries.
- • Travel and connect. Expect quarterly trips to our London, Amsterdam or Berlin hubs to run workshops, share best practices and keep the global team aligned.
Skills & Technologies
About Cusp AI Ltd
Cusp AI is a Cambridge-based startup applying generative artificial intelligence and deep learning to the discovery and design of next-generation materials for carbon capture, hydrogen storage and other clean-energy applications. The company combines physics-informed models, molecular simulation and high-throughput cloud computing to predict and optimize porous frameworks such as metal-organic frameworks and covalent organic frameworks, dramatically reducing the time and cost needed to identify candidates for scalable carbon dioxide removal. Founded in 2023 by ex-Google researchers, Cusp AI collaborates with national laboratories and industrial partners to translate AI-generated molecules into pilot-scale demonstrations.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities

Actian Corporation
6 months ago

BaseTen Inc.
18 hours ago

GameChanger Media, Inc.
12 days ago

Adaptive ML SAS
5 months ago