This job has expired

This position was posted on March 12, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Director, Infrastructure

FluidStack Inc.

Job Overview

Location

US Remote

Job Type

Full-time

Full Job Description

📋 Description

• Fluidstack is seeking a highly experienced and visionary Director of Infrastructure to spearhead the development, deployment, and operational excellence of our cutting-edge bare-metal clusters. These clusters are the backbone of some of the world's largest AI supercomputing initiatives, powering groundbreaking research and development for leading AI labs, governments, and enterprises. As the Director of Infrastructure, you will be instrumental in shaping the future of AI compute by ensuring Fluidstack delivers unparalleled speed, reliability, and scale.
• This pivotal role demands a leader who can bridge the gap between deep technical expertise and strategic business objectives. You will manage and mentor a high-performing team of Infrastructure Engineers, including specialists in Networking, Compute Systems, and Storage. Your leadership will foster a culture of technical rigor, rapid deployment, and unwavering operational reliability, setting new industry benchmarks.
• A core responsibility will be to drive the architectural decisions for our next-generation GPU systems. This includes meticulously defining server configurations, designing robust frontend and backend network fabrics, optimizing storage topologies, and managing the intricate power and cooling envelopes within our data centers. Your insights will be crucial in selecting and integrating the latest hardware, including NVIDIA, AMD, and other advanced accelerators (XPUs), to meet the demanding requirements of AI workloads.
• You will collaborate closely with our Supply Chain and Procurement teams to forge strong OEM relationships, define precise hardware specifications, and meticulously manage delivery timelines. This ensures our physical infrastructure roadmap consistently anticipates and exceeds customer commitments, providing a critical competitive advantage.
• Seamless integration with Data Center Operations is paramount. You will partner to ensure the smooth bring-up of new sites, overseeing the transition from civil and MEP completion through the complex stages of network cabling, hardware racking, comprehensive burn-in testing, and final customer acceptance.
• A key aspect of this role involves working hand-in-hand with our Software Engineering and Site Reliability Engineering (SRE) teams. You will define and translate infrastructure requirements for managed Kubernetes, SLURM, and inference serving platforms, ensuring the physical infrastructure layer is perfectly optimized to support the demands of the software stack and deliver peak performance.
• To enable our team to operate with exceptional speed and reliability, you will champion the development and maintenance of sophisticated deployment tooling, automated burn-in processes, and advanced hardware lifecycle management systems. These tools are essential for maintaining Fluidstack's position as a leader in the industry.
• While leading a team, you are expected to remain technically hands-on. This includes actively participating in critical design reviews, being present during significant cluster bring-ups, and engaging directly with complex infrastructure failures. This direct involvement is vital for maintaining technical credibility with your team and across the entire organization.
• You will represent Fluidstack at various locations, including data centers, OEM facilities, customer sites, and industry events. This travel is essential for staying intimately connected with the hardware, our partners, and the evolving market landscape.
• Strategic financial planning and cost management are also within your purview. You will coordinate with the Finance department on infrastructure Capital Expenditure (CapEx) planning and detailed cost modeling. Furthermore, you will collaborate with the Security team to define and implement robust hardening and compliance requirements, and with the Sales team to provide pre-sales technical diligence and confirm capacity commitments to our valued customers.
• The ideal candidate has personally overseen the successful deployment of a 10,000+ GPU cluster using current-generation hardware, understanding the intricacies of bringing such massive systems online in weeks rather than months. You possess a proven track record of building the necessary tooling, comprehensive runbooks, and fostering a team culture that enables repeated success at scale.

Skills & Technologies

Kubernetes

PyTorch

DevOps

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

FluidStack Inc.

Visit Website

About FluidStack Inc.

FluidStack Inc. operates a distributed cloud platform that aggregates under-utilized GPUs in data centers and individual machines worldwide, renting them on-demand to AI researchers, startups, and enterprises for training and inference workloads. The company automates deployment, security, and billing, offering prices up to 80% below traditional hyperscalers while providing instant access to high-end NVIDIA A100, H100, and consumer GPUs through a single API and web console. Headquartered in London, FluidStack targets machine-learning engineers who need scalable, low-cost compute without long-term commitments, claiming thousands of active nodes and customers including Fortune 500 enterprises and leading research labs.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.