This job has expired

This position was posted on March 4, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Product Manager, Managed Services

FluidStack Inc.

Job Overview

Location

New York, NY

Job Type

Full-time

Full Job Description

📋 Description

• Fluidstack is at the forefront of building the infrastructure for abundant intelligence, partnering with leading AI labs, governments, and enterprises to accelerate the realization of Artificial General Intelligence (AGI). We are seeking a highly motivated and skilled Product Manager to spearhead our managed services portfolio, with a particular focus on SLURM and Kubernetes control planes.
• In this pivotal role, you will be instrumental in shaping the product vision and defining the strategic roadmap for how enterprises can effectively deploy, manage, and scale their complex workloads on Fluidstack's cutting-edge infrastructure. Your responsibilities will span the entire lifecycle of these services, from initial cluster provisioning and configuration through to ongoing lifecycle management, comprehensive observability, and continuous optimization.
• This position operates at the critical intersection of infrastructure, developer experience, and operational excellence. You will collaborate closely with cross-functional teams, including engineering, datacenter operations, and customer-facing departments, to architect and deliver robust control plane capabilities designed to scale efficiently to support massive 100,000+ GPU megaclusters.
• A core aspect of your role will be to own and drive the product roadmap for our managed SLURM and Kubernetes offerings. This includes defining the architecture of the control plane, implementing advanced autoscaling mechanisms, ensuring robust multi-tenancy support, and overseeing comprehensive cluster lifecycle management.
• You will be responsible for defining stringent requirements for control plane performance, reliability, and availability. This involves setting precise specifications for API rate limits, etcd scaling strategies, provisioning tiers, and sophisticated failure recovery mechanisms to ensure uninterrupted service.
• Working hand-in-hand with the engineering teams, you will contribute to the design of automated provisioning workflows, sophisticated health monitoring systems, and intelligent node lifecycle controllers. The goal is to minimize cluster downtime and maximize GPU utilization, ensuring our clients can leverage our infrastructure to its fullest potential.
• You will forge strong partnerships with our datacenter and networking teams to guarantee that the control plane infrastructure scales seamlessly across diverse geographic regions and effectively supports hybrid deployment models, offering maximum flexibility to our customers.
• A key responsibility will be to drive strategic decisions regarding build versus integrate opportunities with existing ecosystem tools such as Rancher, OpenShift, Slurm accounting, and various workload orchestrators. These decisions will be informed by deep customer needs analysis and a thorough understanding of the competitive landscape.
• You will define critical metrics and Service Level Agreements (SLAs) for control plane uptime, API performance, scheduler throughput, and pod/job launch latency, setting clear benchmarks for operational success.
• Engaging directly with customers will be essential. You will conduct thorough customer discovery to gain a deep understanding of their pain points related to cluster management, job queueing, resource allocation, and multi-cluster orchestration, translating these insights into actionable product improvements.
• You will be tasked with creating comprehensive product documentation, detailed deployment guides, and illustrative reference architectures. These materials will empower enterprise customers to successfully run large-scale AI training and inference workloads on our platform.
• A crucial part of your role involves analyzing competitive offerings from major cloud providers like AWS EKS, Google GKE, DigitalOcean DOKS, and specialized HPC providers. This analysis will directly inform feature prioritization, pricing strategies, and our overall competitive positioning.
• Ultimately, you will be the champion for the managed services portfolio, ensuring it meets the evolving needs of our enterprise clients and solidifies Fluidstack's position as a leader in AI infrastructure.

Skills & Technologies

Node.js

AWS

Kubernetes

Product Management

Hybrid

$180k-250k

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

FluidStack Inc.

Visit Website

About FluidStack Inc.

FluidStack Inc. operates a distributed cloud platform that aggregates under-utilized GPUs in data centers and individual machines worldwide, renting them on-demand to AI researchers, startups, and enterprises for training and inference workloads. The company automates deployment, security, and billing, offering prices up to 80% below traditional hyperscalers while providing instant access to high-end NVIDIA A100, H100, and consumer GPUs through a single API and web console. Headquartered in London, FluidStack targets machine-learning engineers who need scalable, low-cost compute without long-term commitments, claiming thousands of active nodes and customers including Fortune 500 enterprises and leading research labs.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.