BaseTen Inc. logo

Data Center Network Engineer

Job Overview

Location

San Francisco

Job Type

Full-time

Category

Software Engineering

Date Posted

May 22, 2026

Full Job Description

đź“‹ Description

  • • Design and own end-to-end network architecture for data center clusters powering GPU-based AI inference and training systems.
  • • Define cluster fabric architectures using InfiniBand or high-performance Ethernet protocols to optimize throughput and reduce latency for distributed workloads.
  • • Design and implement spine-leaf topologies and rack-level connectivity for scalable, high-availability data center networks.
  • • Select and specify switches, optics, and cabling systems based on performance, reliability, and scalability requirements for GPU clusters.
  • • Lead network bring-up, validation, and performance testing across new and existing data center deployments.
  • • Partner closely with hardware and platform engineering teams to align network design with system-level performance goals.
  • • Define and document standardized network deployment practices to ensure consistency across multiple data center sites.
  • • Perform ongoing network performance tuning to support demanding AI workloads, including RDMA and low-latency communication protocols.
  • • Own technical decision-making for network infrastructure at a staff level, with direct impact on model training and inference efficiency.
  • • Mentor and support junior engineers and future team members as the network team scales.
  • • Collaborate with cross-functional teams to troubleshoot complex network issues affecting distributed AI systems.
  • • Contribute to the evolution of network standards and best practices for high-performance computing environments.
  • • Ensure network infrastructure meets the reliability and performance demands of mission-critical AI applications used by leading ML companies.
  • • Maintain detailed documentation of network configurations, topology diagrams, and operational procedures.
  • • Participate in on-call rotations to respond to critical network incidents affecting production AI infrastructure.

🎯 Requirements

  • • Experience designing and operating data center or HPC networks.
  • • Strong familiarity with InfiniBand, RDMA, or high-performance Ethernet.
  • • Strong hands-on skills in network configuration, debugging, and performance tuning.
  • • Experience owning complex systems end-to-end at a senior level.
  • • Experience leading technical projects or cross-functional efforts.
  • • Prior leadership or mentoring experience is a plus.

🏖️ Benefits

  • • Competitive compensation, including meaningful equity.
  • • 100% coverage of medical, dental, and vision insurance for employee and dependents.
  • • Flexible PTO policy including company-wide Winter Break (offices closed from Christmas Eve to New Year's Day).
  • • Paid parental leave.
  • • Fertility and family-building stipend through Carrot.
  • • Company-facilitated 401(k).
  • • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Skills & Technologies

Apache Spark
Onsite

Ready to Apply?

You will be redirected to an external site to apply.

BaseTen Inc. logo
BaseTen Inc.
Visit Website

About BaseTen Inc.

BaseTen provides a serverless, GPU-accelerated platform that lets machine-learning teams deploy, scale and monitor custom models behind autoscaling inference endpoints. The service abstracts infrastructure management, supports PyTorch, TensorFlow and Hugging Face artifacts, and offers built-in observability, A/B testing and fine-tuning. Customers integrate via REST or GraphQL APIs and pay only for compute used. Founded in 2019 and headquartered in San Francisco, BaseTen targets data scientists and product teams seeking production-grade ML serving without Kubernetes complexity.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

US-AZ-REMOTE
Full-time
Expires Jul 10, 2026
Python
JavaScript
TypeScript
+5 more

27 days ago

Apply
PAE Holding Corporation, LLC logo

PAE Holding Corporation, LLC

US-Remote
Full-time
Expires Aug 5, 2026
Remote
$80k-100k
Degree Required

23 hours ago

Apply
Expired
Marina Del Rey, CA
Full-time
Expired May 24, 2026
Python
AWS
Azure
+5 more

2 months ago

Apply
Expires soon
Reston, VA
Full-time
Expires Jun 14, 2026 (Soon)
JavaScript
TypeScript
React
+5 more

2 months ago

Apply