This job has expired

This position was posted on October 3, 2025 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Mindbeam AI logo

ML Engineer - Inference

Job Overview

Location

United States

Job Type

Full-time

Category

Software Engineering

Date Posted

October 3, 2025

Full Job Description

đź“‹ Description

  • • Shape the public face of Mindbeam’s next-generation AI infrastructure by designing and owning the inference APIs, SDKs, and command-line tools that thousands of researchers, developers, and Fortune-500 engineers will rely on every day.
  • • Translate cutting-edge research models—from billion-parameter transformers to novel sparse architectures—into rock-solid, low-latency production endpoints that serve millions of predictions per second with sub-100 ms p99 latency.
  • • Build intuitive abstractions that hide the complexity of distributed inference, automatic batching, dynamic quantization, and hardware-specific optimization (GPU, TPU, Inferentia) behind clean, idiomatic Python, REST, and gRPC interfaces.
  • • Partner shoulder-to-shoulder with our research team to co-design new model-serving paradigms, then shepherd those ideas through alpha, beta, and GA by writing design docs, RFCs, and example notebooks that turn bleeding-edge science into copy-paste developer joy.
  • • Own the full lifecycle of our inference stack: profiling, benchmarking, autoscaling policies, canary rollouts, and real-time monitoring with Prometheus, Grafana, and custom ML health checks to guarantee five-nines reliability for enterprise SLAs.
  • • Champion the developer experience: run weekly user interviews, mine GitHub issues, and instrument SDK telemetry to discover friction points, then ship weekly releases that cut integration time from hours to minutes.
  • • Craft security-first architectures that satisfy SOC 2, HIPAA, and FedRAMP controls—end-to-end TLS, customer-managed keys, VPC peering, and fine-grained IAM—so regulated industries can adopt Mindbeam without a second thought.
  • • Contribute to open-source: upstream improvements to TensorRT, vLLM, and Kubernetes Serving, publish blog posts and conference talks, and grow a vibrant community that extends our platform in ways we never imagined.
  • • Mentor junior engineers through pair programming, design reviews, and lightning talks; foster a culture where curiosity, psychological safety, and constructive dissent lead to breakthrough ideas.
  • • Iterate at startup speed: ship an MVP in days, measure, learn, and pivot without ego, while still building the robust foundations that will scale to exaFLOP clusters tomorrow.

Skills & Technologies

Python
Docker
Kubernetes
TensorFlow
PyTorch
Data Science
Onsite
Degree Required
Remote

Ready to Apply?

You will be redirected to an external site to apply.

Mindbeam AI logo
Mindbeam AI
Visit Website

About Mindbeam AI

Mindbeam AI is a New York City–based startup specializing in next-generation AI infrastructure. Its flagship product, Litespark, is a framework designed to accelerate the pre-training and fine-tuning of large language models (LLMs). Litespark utilizes advanced algorithms to significantly reduce training times—from months to days—while minimizing costs and energy consumption. The framework is compatible with industry-standard machine learning frameworks like PyTorch, TensorFlow, and JAX, and is optimized for NVIDIA GPU hardware. Mindbeam's solutions are utilized by Fortune 100 enterprises and are available on AWS Marketplace.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

Binance Holdings Limited logo

Binance Holdings Limited

Latin America
Full-time
Expires Jun 6, 2026
Onsite

14 days ago

Apply
XN Limited logo

XN Limited

Warszawa, Masovian Voivodeship, Poland
Full-time
Expires Jun 13, 2026
Python
Java
TensorFlow
+2 more

8 days ago

Apply
China
Full-time
Expires May 9, 2026
Onsite

1 month ago

Apply
IND.Chennai
Full-time
Expires Jun 17, 2026
Java
Rust
Scala
+4 more

3 days ago

Apply