This job has expired

This position was posted on October 3, 2025 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

ML Engineer - Inference

Mindbeam AI

Job Overview

Location

United States

Job Type

Full-time

Full Job Description

📋 Description

• Shape the public face of Mindbeam’s next-generation AI infrastructure by designing and owning the inference APIs, SDKs, and command-line tools that thousands of researchers, developers, and Fortune-500 engineers will rely on every day.
• Translate cutting-edge research models—from billion-parameter transformers to novel sparse architectures—into rock-solid, low-latency production endpoints that serve millions of predictions per second with sub-100 ms p99 latency.
• Build intuitive abstractions that hide the complexity of distributed inference, automatic batching, dynamic quantization, and hardware-specific optimization (GPU, TPU, Inferentia) behind clean, idiomatic Python, REST, and gRPC interfaces.
• Partner shoulder-to-shoulder with our research team to co-design new model-serving paradigms, then shepherd those ideas through alpha, beta, and GA by writing design docs, RFCs, and example notebooks that turn bleeding-edge science into copy-paste developer joy.
• Own the full lifecycle of our inference stack: profiling, benchmarking, autoscaling policies, canary rollouts, and real-time monitoring with Prometheus, Grafana, and custom ML health checks to guarantee five-nines reliability for enterprise SLAs.
• Champion the developer experience: run weekly user interviews, mine GitHub issues, and instrument SDK telemetry to discover friction points, then ship weekly releases that cut integration time from hours to minutes.
• Craft security-first architectures that satisfy SOC 2, HIPAA, and FedRAMP controls—end-to-end TLS, customer-managed keys, VPC peering, and fine-grained IAM—so regulated industries can adopt Mindbeam without a second thought.
• Contribute to open-source: upstream improvements to TensorRT, vLLM, and Kubernetes Serving, publish blog posts and conference talks, and grow a vibrant community that extends our platform in ways we never imagined.
• Mentor junior engineers through pair programming, design reviews, and lightning talks; foster a culture where curiosity, psychological safety, and constructive dissent lead to breakthrough ideas.
• Iterate at startup speed: ship an MVP in days, measure, learn, and pivot without ego, while still building the robust foundations that will scale to exaFLOP clusters tomorrow.

Skills & Technologies

Python

Docker

Kubernetes

TensorFlow

PyTorch

Data Science

Onsite

Degree Required

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Mindbeam AI

Visit Website

About Mindbeam AI

Mindbeam AI is a New York City–based startup specializing in next-generation AI infrastructure. Its flagship product, Litespark, is a framework designed to accelerate the pre-training and fine-tuning of large language models (LLMs). Litespark utilizes advanced algorithms to significantly reduce training times—from months to days—while minimizing costs and energy consumption. The framework is compatible with industry-standard machine learning frameworks like PyTorch, TensorFlow, and JAX, and is optimized for NVIDIA GPU hardware. Mindbeam's solutions are utilized by Fortune 100 enterprises and are available on AWS Marketplace.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.