This job has expired

This position was posted on May 22, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Data Center Network Engineer

BaseTen Inc.

Job Overview

Location

San Francisco

Job Type

Full-time

Full Job Description

📋 Description

• Design and own end-to-end network architecture for data center clusters powering GPU-based AI inference and training systems.
• Define cluster fabric architectures using InfiniBand or high-performance Ethernet protocols to optimize throughput and reduce latency for distributed workloads.
• Design and implement spine-leaf topologies and rack-level connectivity for scalable, high-availability data center networks.
• Select and specify switches, optics, and cabling systems based on performance, reliability, and scalability requirements for GPU clusters.
• Lead network bring-up, validation, and performance testing across new and existing data center deployments.
• Partner closely with hardware and platform engineering teams to align network design with system-level performance goals.
• Define and document standardized network deployment practices to ensure consistency across multiple data center sites.
• Perform ongoing network performance tuning to support demanding AI workloads, including RDMA and low-latency communication protocols.
• Own technical decision-making for network infrastructure at a staff level, with direct impact on model training and inference efficiency.
• Mentor and support junior engineers and future team members as the network team scales.
• Collaborate with cross-functional teams to troubleshoot complex network issues affecting distributed AI systems.
• Contribute to the evolution of network standards and best practices for high-performance computing environments.
• Ensure network infrastructure meets the reliability and performance demands of mission-critical AI applications used by leading ML companies.
• Maintain detailed documentation of network configurations, topology diagrams, and operational procedures.
• Participate in on-call rotations to respond to critical network incidents affecting production AI infrastructure.

🎯 Requirements

• Experience designing and operating data center or HPC networks.
• Strong familiarity with InfiniBand, RDMA, or high-performance Ethernet.
• Strong hands-on skills in network configuration, debugging, and performance tuning.
• Experience owning complex systems end-to-end at a senior level.
• Experience leading technical projects or cross-functional efforts.
• Prior leadership or mentoring experience is a plus.

🏖️ Benefits

• Competitive compensation, including meaningful equity.
• 100% coverage of medical, dental, and vision insurance for employee and dependents.
• Flexible PTO policy including company-wide Winter Break (offices closed from Christmas Eve to New Year's Day).
• Paid parental leave.
• Fertility and family-building stipend through Carrot.
• Company-facilitated 401(k).
• Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Skills & Technologies

Apache Spark

Onsite

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

BaseTen Inc.

Visit Website

About BaseTen Inc.

BaseTen provides a serverless, GPU-accelerated platform that lets machine-learning teams deploy, scale and monitor custom models behind autoscaling inference endpoints. The service abstracts infrastructure management, supports PyTorch, TensorFlow and Hugging Face artifacts, and offers built-in observability, A/B testing and fine-tuning. Customers integrate via REST or GraphQL APIs and pay only for compute used. Founded in 2019 and headquartered in San Francisco, BaseTen targets data scientists and product teams seeking production-grade ML serving without Kubernetes complexity.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.