This job has expired

This position was posted on February 27, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Facility Operation Lead L3

FluidStack Inc.

Job Overview

Location

Buffalo, NY

Job Type

Full-time

Full Job Description

📋 Description

• Fluidstack is at the forefront of building the essential infrastructure for the era of abundant intelligence, partnering with leading AI labs, governments, and enterprises to deliver compute power at unprecedented speeds. We are driven by a mission to accelerate the realization of Artificial General Intelligence (AGI), fostering a highly motivated and committed team dedicated to engineering world-class infrastructure. Our core philosophy centers on treating customer outcomes as our own, taking immense pride in the systems we construct and the trust we cultivate. If you are purpose-driven, relentlessly focused on excellence, and prepared to invest significant effort to expedite the future of intelligence, we invite you to join us in shaping what comes next.
• The Data Center Operation team is the engine behind Fluidstack's rapid expansion, responsible for the deployment and ongoing operation of hyperscale data centers. This team assumes comprehensive onsite responsibility for each facility, overseeing the entire lifecycle of the hardware fleet. We provide a diverse array of infrastructure solutions and services, ensuring they are not only scalable but also exceptionally reliable, forming the backbone of our AI compute capabilities.
• This role is designed for individuals possessing robust technical infrastructure backgrounds, a deep passion for engineering excellence, and a commitment to delivering effective, sustainable, and forward-thinking solutions. You will possess a keen understanding of the complexities involved in deploying infrastructure at scale and have a proven track record in designing systems and platforms that establish global benchmarks for performance, availability, security, and cost-efficiency. The work undertaken in this position will have a tangible influence on processes and standards worldwide.
• As a Facility Operation Lead L3, you will serve as a critical Subject Matter Expert (SME) for our Data Center Operations teams, providing advanced support and managing escalations related to critical data center infrastructure and other facilities within the white space. Your expertise will be instrumental in maintaining the operational integrity and efficiency of our cutting-edge facilities.
• A key aspect of your role will involve a deep understanding of the design limitations and inherent risks associated with our data center facilities. You will be responsible for conducting regular audits, performing thorough reviews, and actively collaborating with our colocation partners and service providers. This collaboration will focus on assessing the effectiveness of their maintenance programs and ensuring alignment with Fluidstack's stringent operational standards.
• You will take ownership of and drive the change management process for all high-risk maintenance activities conducted within our data centers. This includes meticulous planning, risk assessment, stakeholder communication, and ensuring that all procedures adhere to best practices for safety and operational continuity.
• In the event of an incident, you will act as the primary onsite responder, leveraging your expertise to manage the immediate situation while coordinating with remote SMEs from our datacenter engineering and infrastructure domains for swift resolution. Your ability to remain calm under pressure and effectively direct response efforts will be crucial.
• Furthermore, you will provide essential onsite services to support a multitude of other strategic initiatives. This includes fostering and maintaining positive, collaborative working relationships with customers, partner teams, vendors, and all internal stakeholders. Your interpersonal skills and ability to navigate complex relationships will be vital to project success.
• This role demands a proactive approach to identifying and mitigating potential operational issues before they impact performance. You will contribute to the continuous improvement of our operational procedures, documentation, and training materials, ensuring that our teams are equipped with the knowledge and tools necessary to maintain our high standards.
• You will be involved in the planning and execution of hardware deployments, upgrades, and decommissioning, ensuring minimal disruption to ongoing operations. This includes coordinating with logistics, procurement, and engineering teams to ensure timely and efficient hardware lifecycle management.
• Your responsibilities will also extend to ensuring compliance with all relevant safety regulations, environmental standards, and company policies within the data center environment. This involves regular safety inspections and promoting a culture of safety awareness among all personnel on site.
• You will play a role in the evaluation and selection of new technologies and equipment, providing technical input to ensure that proposed solutions meet our performance, reliability, and scalability requirements for future growth.
• The ability to interpret and act upon complex technical data, including performance metrics, environmental readings, and system logs, will be essential for proactive problem-solving and optimization.
• You will contribute to the development and execution of preventative maintenance schedules for all critical infrastructure components, ensuring maximum uptime and longevity of our assets.
• This role offers a unique opportunity to be at the heart of a rapidly growing company at the cutting edge of AI infrastructure, making a significant impact on the development and deployment of advanced computing resources.

🎯 Requirements

• Strong knowledge of data center critical infrastructure, including power and cooling systems, with an understanding of standard calculations for capacity requirements (e.g., TIA-942, ASHRAE).
• Experience with power redundancy concepts (e.g., 2N, N+1 Distributed Redundant) and cooling redundancy (e.g., N+1, N+2) and their implementation in design and operation.
• Proven experience in supervising construction teams for data hall build-outs, including coordination of IT services, power, and cooling planning.
• Demonstrated ability to influence testing and commissioning teams to perform comprehensive validation tests (L1-L5) and resolve technical issues.
• Electrical and/or mechanical engineering background with hands-on experience with equipment such as generators, chillers, cooling towers, air handling units, UPS, and electrical sub-distribution systems.
• Direct experience working with GPU servers (e.g., H100, B200, GB200) is a significant advantage.

🏖️ Benefits

• Competitive total compensation package including salary and equity.
• Retirement or pension plan, in line with local norms.
• Comprehensive health, dental, and vision insurance.
• Generous Paid Time Off (PTO) policy, in line with local norms.

Skills & Technologies

Senior

Onsite

$100k-150k

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

FluidStack Inc.

Visit Website

About FluidStack Inc.

FluidStack Inc. operates a distributed cloud platform that aggregates under-utilized GPUs in data centers and individual machines worldwide, renting them on-demand to AI researchers, startups, and enterprises for training and inference workloads. The company automates deployment, security, and billing, offering prices up to 80% below traditional hyperscalers while providing instant access to high-end NVIDIA A100, H100, and consumer GPUs through a single API and web console. Headquartered in London, FluidStack targets machine-learning engineers who need scalable, low-cost compute without long-term commitments, claiming thousands of active nodes and customers including Fortune 500 enterprises and leading research labs.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.