Crusoe Energy Systems LLC logo

Senior Engineering Manager, Network Observability

Job Overview

Location

California, USA

Job Type

Full-time

Category

Engineering Manager

Date Posted

February 26, 2026

Full Job Description

đź“‹ Description

  • • Crusoe Cloud is at the forefront of the AI revolution, building the sustainable infrastructure that powers ambitious AI development. We are seeking a visionary and highly technical leader to spearhead our Network Development efforts, a critical function responsible for the design, engineering, and continuous evolution of the high-performance network fabrics that underpin our global AI and HPC infrastructure.
  • • As the Senior Engineering Manager for Network Observability, you will be a "builder" at heart, leading a team dedicated to translating complex architectural requirements into robust, scalable, automated, and exceptionally high-performance network designs. Your leadership will directly impact the efficiency and effectiveness of our cutting-edge AI training environments, from optimizing intricate GPU-to-GPU communications within our massive HPC clusters to architecting and deploying the sophisticated automation frameworks that manage our global network backbone.
  • • This role offers a unique opportunity to shape the future of networking for the AI era. You will own the end-to-end development lifecycle of Crusoe’s network architecture, with a strong emphasis on designing and implementing the next generation of GPU fabrics and global interconnects that are essential for massive-scale distributed AI training.
  • • A key focus will be on driving innovation in HPC and AI fabrics, specifically engineering low-latency, high-bandwidth networks. This includes deep dives into technologies like RoCEv2 and InfiniBand, understanding their nuances, and optimizing them to unlock the full potential of our GPU compute resources.
  • • You will be instrumental in defining and executing the automation roadmap for our network infrastructure. This involves architecting and implementing comprehensive "Network-as-Code" initiatives, leading the development of robust CI/CD pipelines for network configurations, enabling automated provisioning of network services, and building self-healing infrastructure capabilities to ensure maximum uptime and resilience.
  • • Strategic hardware evaluation and vendor engineering will be a significant part of your responsibilities. You will be tasked with evaluating and certifying next-generation hardware platforms, working closely with key vendors such as Arista, NVIDIA, and Juniper. Your insights will help influence their product roadmaps, ensuring Crusoe consistently has access to the best-in-class silicon and networking equipment tailored for our demanding AI and HPC workloads.
  • • Building and mentoring a high-performing development culture is paramount. You will recruit, lead, and inspire a team of talented Network Development Engineers (NDE) and Software Engineers. This team will be crucial in bridging the gap between traditional networking practices and modern software engineering principles, fostering a collaborative and innovative environment.
  • • Cross-functional product partnership is essential for success. You will collaborate closely with Product Management, Hardware Engineering, and Network Operations teams. This ensures that our network designs are not only engineered for peak performance but are also inherently observable, easily maintainable, and operationally sound at our rapidly scaling global infrastructure.
  • • Your responsibilities will extend to defining and implementing advanced network monitoring and observability strategies. This includes developing telemetry, logging, and alerting mechanisms that provide deep insights into network performance, health, and potential issues, enabling proactive problem-solving and continuous optimization.
  • • You will play a pivotal role in capacity planning and performance tuning, ensuring our network infrastructure can meet the ever-increasing demands of our AI and HPC workloads. This involves analyzing traffic patterns, identifying bottlenecks, and implementing solutions to enhance throughput and reduce latency.
  • • Furthermore, you will contribute to the security architecture of our network, ensuring robust security practices are integrated from the design phase, including multi-tenant security considerations and advanced threat detection mechanisms.
  • • This role requires a leader who can think strategically about the long-term evolution of network technology while also being hands-on in guiding the technical execution of the team. You will be a key contributor to Crusoe's mission of accelerating the abundance of energy and intelligence by providing the foundational network infrastructure for the world's most demanding AI computations.

Skills & Technologies

Python
AWS
Azure
GCP
Terraform
Senior
Onsite
$237k-288k
Degree Required

Ready to Apply?

You will be redirected to an external site to apply.

Crusoe Energy Systems LLC logo
Crusoe Energy Systems LLC
Visit Website

About Crusoe Energy Systems LLC

Crusoe Energy Systems is building Crusoe Cloud, an AI cloud platform that provides managed AI services and AI data center infrastructure. They cater to businesses seeking to accelerate AI solution development with optimized models and high-performance computing. The company utilizes environmentally aligned power sources, including wind, solar, and natural gas, to power its data centers. With features like managed Kubernetes and Slurm, Crusoe simplifies operations and ensures reliability with 24/7 support. Crusoe is expanding its reach, including a strategic European expansion with its first data center in Norway. Crusoe recently raised $1.375 billion at a valuation above $10 billion.

Similar Opportunities

Nice
Full-time
Expires Apr 28, 2026
Onsite

16 days ago

Apply
❌ EXPIRED
Argentina
Full-time
Expired Nov 22, 2025
JavaScript
TypeScript
Java
+4 more

6 months ago

Apply
San Francisco, Austria
Full-time
Expires May 10, 2026
Go
Senior
Remote

3 days ago

Apply
Nice, Australia
Full-time
Expires Apr 25, 2026

19 days ago

Apply