
Job Overview
Location
New York, USA
Job Type
Full-time
Category
Product Manager
Date Posted
March 4, 2026
Full Job Description
đź“‹ Description
- • Fluidstack is at the forefront of building the infrastructure for abundant intelligence, partnering with leading AI labs, governments, and enterprises to accelerate the realization of Artificial General Intelligence (AGI). We are seeking a highly motivated and skilled Product Manager to spearhead our managed services portfolio, with a particular focus on SLURM and Kubernetes control planes.
- • In this pivotal role, you will be instrumental in shaping the product vision and defining the strategic roadmap for how enterprises can effectively deploy, manage, and scale their complex workloads on Fluidstack's cutting-edge infrastructure. Your responsibilities will span the entire lifecycle of these services, from initial cluster provisioning and configuration through to ongoing lifecycle management, comprehensive observability, and continuous optimization.
- • This position operates at the critical intersection of infrastructure, developer experience, and operational excellence. You will collaborate closely with cross-functional teams, including engineering, datacenter operations, and customer-facing departments, to architect and deliver robust control plane capabilities designed to scale efficiently to support massive 100,000+ GPU megaclusters.
- • A core aspect of your role will be to own and drive the product roadmap for our managed SLURM and Kubernetes offerings. This includes defining the architecture of the control plane, implementing advanced autoscaling mechanisms, ensuring robust multi-tenancy support, and overseeing comprehensive cluster lifecycle management.
- • You will be responsible for defining stringent requirements for control plane performance, reliability, and availability. This involves setting precise specifications for API rate limits, etcd scaling strategies, provisioning tiers, and sophisticated failure recovery mechanisms to ensure uninterrupted service.
- • Working hand-in-hand with the engineering teams, you will contribute to the design of automated provisioning workflows, sophisticated health monitoring systems, and intelligent node lifecycle controllers. The goal is to minimize cluster downtime and maximize GPU utilization, ensuring our clients can leverage our infrastructure to its fullest potential.
- • You will forge strong partnerships with our datacenter and networking teams to guarantee that the control plane infrastructure scales seamlessly across diverse geographic regions and effectively supports hybrid deployment models, offering maximum flexibility to our customers.
- • A key responsibility will be to drive strategic decisions regarding build versus integrate opportunities with existing ecosystem tools such as Rancher, OpenShift, Slurm accounting, and various workload orchestrators. These decisions will be informed by deep customer needs analysis and a thorough understanding of the competitive landscape.
- • You will define critical metrics and Service Level Agreements (SLAs) for control plane uptime, API performance, scheduler throughput, and pod/job launch latency, setting clear benchmarks for operational success.
- • Engaging directly with customers will be essential. You will conduct thorough customer discovery to gain a deep understanding of their pain points related to cluster management, job queueing, resource allocation, and multi-cluster orchestration, translating these insights into actionable product improvements.
- • You will be tasked with creating comprehensive product documentation, detailed deployment guides, and illustrative reference architectures. These materials will empower enterprise customers to successfully run large-scale AI training and inference workloads on our platform.
- • A crucial part of your role involves analyzing competitive offerings from major cloud providers like AWS EKS, Google GKE, DigitalOcean DOKS, and specialized HPC providers. This analysis will directly inform feature prioritization, pricing strategies, and our overall competitive positioning.
- • Ultimately, you will be the champion for the managed services portfolio, ensuring it meets the evolving needs of our enterprise clients and solidifies Fluidstack's position as a leader in AI infrastructure.
Skills & Technologies
About FluidStack Inc.
FluidStack Inc. operates a distributed cloud platform that aggregates under-utilized GPUs in data centers and individual machines worldwide, renting them on-demand to AI researchers, startups, and enterprises for training and inference workloads. The company automates deployment, security, and billing, offering prices up to 80% below traditional hyperscalers while providing instant access to high-end NVIDIA A100, H100, and consumer GPUs through a single API and web console. Headquartered in London, FluidStack targets machine-learning engineers who need scalable, low-cost compute without long-term commitments, claiming thousands of active nodes and customers including Fortune 500 enterprises and leading research labs.
Similar Opportunities

Efinti Technologies, Inc.
3 months ago

Prezzee Pty Ltd
5 days ago

