Crusoe Energy Systems LLC logo

Staff Production Engineer, Cloud Infrastructure

Job Overview

Location

San Francisco, California, USA

Job Type

Full-time

Category

Software Engineering

Date Posted

February 26, 2026

Full Job Description

đź“‹ Description

  • • Crusoe Energy Systems is at the forefront of the AI revolution, building a sustainable cloud infrastructure that powers ambitious AI development without compromising on scale, speed, or environmental responsibility. We are seeking a highly skilled and motivated Staff Production Engineer to join our dynamic team and play a pivotal role in shaping the future of our AI-first compute environment. This is a unique opportunity to contribute to a mission-driven company that is setting the pace for transformative and responsible technology.
  • • As a Staff Production Engineer, you will be instrumental in the design, implementation, and operation of critical components within our cutting-edge cloud platform. Your primary focus will be on ensuring the reliability, scalability, and operational excellence of a defined infrastructure domain. You will work collaboratively with a talented team of engineers, contributing to broader platform strategy and driving innovation across the organization. This role demands a blend of deep hands-on technical expertise and leadership, empowering you to elevate production engineering practices and set new standards for excellence.
  • • Your responsibilities will span the entire lifecycle of our cloud infrastructure. You will be involved in the design, build, and management of core cloud services, encompassing compute, networking, storage, and Identity and Access Management (IAM). A significant part of your role will involve architecting, operating, and scaling our Kubernetes-based platforms, ensuring they are robust, efficient, and ready to handle the demands of large-scale AI workloads.
  • • You will deploy and manage Kubernetes workloads with precision, leveraging Helm charts and advanced continuous deployment systems to ensure smooth and reliable releases. Furthermore, you will contribute to the operation and enhancement of our observability platforms, utilizing tools like VictoriaMetrics and Grafana to provide deep insights into the health and performance of our cloud and Kubernetes environments. This proactive monitoring is crucial for maintaining high availability and quickly identifying and resolving potential issues.
  • • A key aspect of this role involves developing and maintaining Terraform modules. Your expertise in Infrastructure as Code (IaC) will be vital in defining automated, auditable, and secure cloud environments, ensuring consistency and compliance across our infrastructure. You will also own the critical aspects of VPC design, including routing, load balancers, interconnects, peering, and the establishment of robust network security boundaries. This ensures secure and efficient data flow within our cloud ecosystem.
  • • You will be responsible for implementing and enforcing policies and guardrails across IAM, resource hierarchy, service accounts, and VPC Service Controls (VPC-SC). This proactive security posture is essential for protecting our infrastructure and data. Building automation for provisioning, lifecycle management, and implementing advanced deployment patterns such as blue/green or canary deployments will be central to your work, enabling faster iteration and reduced risk.
  • • Collaboration is key to success in this role. You will partner closely with our security and platform teams to enhance monitoring, logging, compliance, and overall operational readiness. Your insights will be invaluable in optimizing cloud costs, managing quotas, and performing capacity planning across multiple projects and diverse geographical regions, ensuring efficient resource utilization.
  • • A significant challenge and opportunity will be troubleshooting complex production issues that span across compute, storage, and networking layers. Your ability to diagnose and resolve these intricate problems will be critical to maintaining the stability and performance of our platform. You will also have the opportunity to influence design decisions and collaborate with cross-functional teams, sharing your expertise and contributing to the strategic direction of our cloud infrastructure.
  • • This role offers a chance to be at the cutting edge of cloud infrastructure for AI, working with a passionate team dedicated to sustainable innovation. You will gain exposure to high-performance computing environments and contribute directly to the success of groundbreaking AI initiatives.

Skills & Technologies

Python
AWS
GCP
Kubernetes
Terraform
DevOps
Senior
Hybrid
$209k-253k

Ready to Apply?

You will be redirected to an external site to apply.

Crusoe Energy Systems LLC logo
Crusoe Energy Systems LLC
Visit Website

About Crusoe Energy Systems LLC

Crusoe Energy Systems is building Crusoe Cloud, an AI cloud platform that provides managed AI services and AI data center infrastructure. They cater to businesses seeking to accelerate AI solution development with optimized models and high-performance computing. The company utilizes environmentally aligned power sources, including wind, solar, and natural gas, to power its data centers. With features like managed Kubernetes and Slurm, Crusoe simplifies operations and ensures reliability with 24/7 support. Crusoe is expanding its reach, including a strategic European expansion with its first data center in Norway. Crusoe recently raised $1.375 billion at a valuation above $10 billion.

Similar Opportunities

Indiana, USA
Full-time
Expires Apr 13, 2026
Python
JavaScript
AWS
+3 more

1 month ago

Apply
Indiana, USA
Full-time
Expires Apr 13, 2026
Python
JavaScript
AWS
+3 more

1 month ago

Apply
SHI International Corp. logo

SHI International Corp.

Indiana, USA
Full-time
Expires Apr 29, 2026
AWS
Azure
Remote
+2 more

15 days ago

Apply
Indiana, USA
Full-time
Expires Apr 13, 2026
Remote

1 month ago

Apply