
Job Overview
Location
California, USA
Job Type
Full-time
Category
Engineering Manager
Date Posted
February 26, 2026
Full Job Description
đź“‹ Description
- • Crusoe Cloud is at the forefront of the AI revolution, building the sustainable infrastructure that powers ambitious AI development. We are seeking a visionary and highly technical leader to spearhead our Network Development efforts, a critical function responsible for the design, engineering, and continuous evolution of the high-performance network fabrics that underpin our global AI and HPC infrastructure.
- • As the Senior Engineering Manager for Network Observability, you will be a "builder" at heart, leading a team dedicated to translating complex architectural requirements into robust, scalable, automated, and exceptionally high-performance network designs. Your leadership will directly impact the efficiency and effectiveness of our cutting-edge AI training environments, from optimizing intricate GPU-to-GPU communications within our massive HPC clusters to architecting and deploying the sophisticated automation frameworks that manage our global network backbone.
- • This role offers a unique opportunity to shape the future of networking for the AI era. You will own the end-to-end development lifecycle of Crusoe’s network architecture, with a strong emphasis on designing and implementing the next generation of GPU fabrics and global interconnects that are essential for massive-scale distributed AI training.
- • A key focus will be on driving innovation in HPC and AI fabrics, specifically engineering low-latency, high-bandwidth networks. This includes deep dives into technologies like RoCEv2 and InfiniBand, understanding their nuances, and optimizing them to unlock the full potential of our GPU compute resources.
- • You will be instrumental in defining and executing the automation roadmap for our network infrastructure. This involves architecting and implementing comprehensive "Network-as-Code" initiatives, leading the development of robust CI/CD pipelines for network configurations, enabling automated provisioning of network services, and building self-healing infrastructure capabilities to ensure maximum uptime and resilience.
- • Strategic hardware evaluation and vendor engineering will be a significant part of your responsibilities. You will be tasked with evaluating and certifying next-generation hardware platforms, working closely with key vendors such as Arista, NVIDIA, and Juniper. Your insights will help influence their product roadmaps, ensuring Crusoe consistently has access to the best-in-class silicon and networking equipment tailored for our demanding AI and HPC workloads.
- • Building and mentoring a high-performing development culture is paramount. You will recruit, lead, and inspire a team of talented Network Development Engineers (NDE) and Software Engineers. This team will be crucial in bridging the gap between traditional networking practices and modern software engineering principles, fostering a collaborative and innovative environment.
- • Cross-functional product partnership is essential for success. You will collaborate closely with Product Management, Hardware Engineering, and Network Operations teams. This ensures that our network designs are not only engineered for peak performance but are also inherently observable, easily maintainable, and operationally sound at our rapidly scaling global infrastructure.
- • Your responsibilities will extend to defining and implementing advanced network monitoring and observability strategies. This includes developing telemetry, logging, and alerting mechanisms that provide deep insights into network performance, health, and potential issues, enabling proactive problem-solving and continuous optimization.
- • You will play a pivotal role in capacity planning and performance tuning, ensuring our network infrastructure can meet the ever-increasing demands of our AI and HPC workloads. This involves analyzing traffic patterns, identifying bottlenecks, and implementing solutions to enhance throughput and reduce latency.
- • Furthermore, you will contribute to the security architecture of our network, ensuring robust security practices are integrated from the design phase, including multi-tenant security considerations and advanced threat detection mechanisms.
- • This role requires a leader who can think strategically about the long-term evolution of network technology while also being hands-on in guiding the technical execution of the team. You will be a key contributor to Crusoe's mission of accelerating the abundance of energy and intelligence by providing the foundational network infrastructure for the world's most demanding AI computations.
Skills & Technologies
About Crusoe Energy Systems LLC
Crusoe Energy Systems is building Crusoe Cloud, an AI cloud platform that provides managed AI services and AI data center infrastructure. They cater to businesses seeking to accelerate AI solution development with optimized models and high-performance computing. The company utilizes environmentally aligned power sources, including wind, solar, and natural gas, to power its data centers. With features like managed Kubernetes and Slurm, Crusoe simplifies operations and ensures reliability with 24/7 support. Crusoe is expanding its reach, including a strategic European expansion with its first data center in Norway. Crusoe recently raised $1.375 billion at a valuation above $10 billion.



