FluidStack Inc. logo

Software Engineer, Inference Platform

Job Overview

Location

San Francisco, California, USA

Job Type

Full-time

Category

Software Engineer

Date Posted

March 7, 2026

Full Job Description

đź“‹ Description

  • • Fluidstack is at the forefront of building the infrastructure for abundant intelligence, partnering with leading AI labs, governments, and enterprises to accelerate the realization of Artificial General Intelligence (AGI). We are driven by a mission to deliver world-class infrastructure with urgency, treating our customers' outcomes as our own and earning trust through the systems we build. If you are purpose-driven, obsessed with excellence, and ready to dedicate significant effort to advancing the future of intelligence, join us in shaping what comes next.
  • • The Inference Platform team at Fluidstack is tasked with addressing the critical cost and latency bottlenecks in frontier AI, which are now defined by inference. This team owns the crucial serving layer that bridges our extensive global accelerator supply with the production workloads of our customers. This includes managing LLM serving frameworks, developing KV cache infrastructure, implementing disaggregated prefill/decode pipelines, and orchestrating these systems across vast multi-datacenter footprints using Kubernetes.
  • • This role is a hands-on individual contributor position situated at the dynamic intersection of distributed systems, model optimization, and serving infrastructure. You will be instrumental in owning end-to-end inference deployments for cutting-edge AI labs and Fluidstack's inference product. Your work will directly drive significant improvements in key performance metrics such as throughput, cost-per-token, and time-to-first-token (TTFT). Furthermore, you will play a vital role in shaping the platform architecture, influencing how Fluidstack deploys and scales its services across tens of thousands of accelerators.
  • • Key responsibilities include taking full ownership of inference deployments from initial setup and performance tuning through to maintaining production Service Level Agreements (SLAs) and responding to incidents. You will be responsible for achieving measurable gains in throughput, TTFT, and cost-per-token across a wide spectrum of model families, including dense transformers, mixture-of-experts (MoE), and multi-modal models, as well as diverse customer workload patterns.
  • • You will develop and operate sophisticated KV cache and scheduling infrastructure designed to maximize resource utilization across concurrent inference requests. This involves implementing and rigorously validating disaggregated prefill/decode pipelines and the Kubernetes orchestration necessary to support these complex systems at scale.
  • • A significant part of the role involves profiling and resolving performance bottlenecks at the compute, memory, and communication layers. This includes instrumenting deployments to ensure comprehensive end-to-end observability, allowing for proactive identification and resolution of issues.
  • • You will collaborate closely with customers to translate their specific model architectures, access patterns, and stringent latency requirements into effective deployment configurations. Your insights will also feed back into upstream platform improvements, ensuring our infrastructure evolves to meet user needs.
  • • You will contribute directly to the inference platform's architecture and roadmap, with a strategic focus on simplifying deployment processes, enhancing hardware utilization, and expanding support for novel model classes and accelerator types.
  • • Participation in an on-call rotation, typically up to one week per month, is expected to ensure the continuous reliability and adherence to SLA commitments for all production deployments.
  • • This role offers a unique opportunity to work with state-of-the-art AI technologies and contribute to the foundational infrastructure powering the next generation of intelligent systems. You will be part of a highly motivated and committed team focused on delivering exceptional results and pushing the boundaries of what's possible in AI infrastructure.

Skills & Technologies

Python
Node.js
Kubernetes
PyTorch
Onsite
$165k-500k

Ready to Apply?

You will be redirected to an external site to apply.

FluidStack Inc. logo
FluidStack Inc.
Visit Website

About FluidStack Inc.

FluidStack Inc. operates a distributed cloud platform that aggregates under-utilized GPUs in data centers and individual machines worldwide, renting them on-demand to AI researchers, startups, and enterprises for training and inference workloads. The company automates deployment, security, and billing, offering prices up to 80% below traditional hyperscalers while providing instant access to high-end NVIDIA A100, H100, and consumer GPUs through a single API and web console. Headquartered in London, FluidStack targets machine-learning engineers who need scalable, low-cost compute without long-term commitments, claiming thousands of active nodes and customers including Fortune 500 enterprises and leading research labs.

Similar Opportunities

Argentina
Full-time
Expires Apr 25, 2026
Python
JavaScript
TypeScript
+4 more

12 days ago

Apply
Argentina
Full-time
Expires May 4, 2026
Python
PHP
Ruby
+5 more

3 days ago

Apply
Argentina
Full-time
Expires Apr 29, 2026
Java
Spring
PostgreSQL
+5 more

8 days ago

Apply
Argentina
Full-time
Expires Apr 28, 2026
JavaScript
TypeScript
Go
+4 more

9 days ago

Apply