OpenAI, Inc. logo

Software Engineer, Workload Enablement

Job Overview

Location

San Francisco

Job Type

Full-time

Category

Backend Engineer

Date Posted

March 28, 2026

Full Job Description

đź“‹ Description

  • • As a Software Engineer on the Workload Enablement team within OpenAI’s Scaling organization, you will play a critical role in ensuring that cutting-edge AI workloads run reliably and efficiently on emerging hardware platforms, directly enabling the deployment of next-generation models.
  • • You will be responsible for porting, validating, and optimizing key inference and training workloads on new and early-access systems, building comprehensive test harnesses and stress benchmarks that reflect real-world end-to-end behavior, and diagnosing performance bottlenecks across compute, memory, networking, and storage subsystems.
  • • The Scaling team designs and delivers the architectural and engineering foundation for OpenAI’s AI infrastructure, spanning system software, high-speed networking, platform architecture, fleet monitoring, and performance optimization to support the training and deployment of frontier models.
  • • You will collaborate closely with systems bring-up engineers, vendor partners, and internal ML teams to ensure platforms are not only performant but also operationally robust, scalable, and observable—integrating with Kubernetes, telemetry systems, and failure triage workflows.
  • • In this role, you will deepen your expertise in high-performance computing, distributed systems, and AI infrastructure, gaining hands-on experience with the latest accelerators, interconnect technologies (like NVLink and RDMA), and software stacks while contributing to the reliability and scalability of OpenAI’s AI platform.

🎯 Requirements

  • • BS in Computer Science, Electrical Engineering, or equivalent practical experience
  • • 5+ years of experience in ML systems, performance engineering, distributed systems, or HPC
  • • Strong hands-on experience with PyTorch and modern LLM training/inference stacks, large-scale distributed training concepts (data/model/pipeline parallelism), and RDMA-based communication libraries (NCCL or RCCL)

🏖️ Benefits

  • • Opportunity to work on cutting-edge AI infrastructure at the forefront of technological innovation
  • • Collaborative, mission-driven culture focused on safe and beneficial AI development
  • • Access to advanced hardware and software tools for performance analysis and system optimization

Skills & Technologies

Python
Kubernetes
PyTorch
Onsite

Ready to Apply?

You will be redirected to an external site to apply.

OpenAI, Inc. logo
OpenAI, Inc.
Visit Website

About OpenAI, Inc.

OpenAI is a San Francisco-based artificial intelligence research and deployment company founded in 2015. It develops large-scale AI models such as GPT, DALL-E, and Codex, providing cloud APIs and consumer applications like ChatGPT. Originally established as a non-profit, it later created a capped-profit subsidiary to attract capital while maintaining its mission to ensure artificial general intelligence benefits all of humanity.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

Yerevan, Armenia
Full-time
Expires Jun 4, 2026
Go
Rust
Ruby
+5 more

17 days ago

Apply
Argentina
Full-time
Expires May 12, 2026
Java
Remote

1 month ago

Apply
Argentina
Full-time
Expires May 20, 2026
JavaScript
TypeScript
React
+5 more

1 month ago

Apply
⏰ EXPIRES SOON
Argentina
Full-time
Expires Apr 29, 2026 (Soon)
Ruby
PostgreSQL
MySQL
+4 more

2 months ago

Apply