
Job Overview
Location
San Francisco
Job Type
Full-time
Category
Software Engineering
Date Posted
May 26, 2026
Full Job Description
đź“‹ Description
- • Build full-stack web applications that provide real-time insights into cluster health, job failures, and usable capacity across OpenAI’s frontier supercomputing clusters.
- • Transform operational questions like "why is this job down?" and "what is preventing these nodes from being ready?" into intuitive, actionable product experiences for researchers and infrastructure teams.
- • Collaborate directly with researchers and infrastructure engineers to identify high-leverage problems in cluster operations and design scalable solutions that improve reliability and efficiency.
- • Design and implement data models, APIs, and visualizations that enable clear inspection and reasoning about large-scale scheduling, resource allocation, and cluster behavior.
- • Develop scalable backend services that process high-volume cluster and workload data with low latency and high reliability to support real-time monitoring and decision-making.
- • Create intuitive frontend workflows that connect users to underlying compute, storage, and scheduling systems, improving usability for non-engineering researchers.
- • Raise the bar for reliability, performance, and security in the systems used to operate OpenAI’s largest supercomputing infrastructure.
- • Work on tools that help diagnose and resolve issues across hardware and software layers, enabling faster recovery and improved cluster availability.
- • Build workflows that streamline scheduling, debugging, and resource management at massive scale, reducing manual toil and increasing operational efficiency.
- • Contribute to the development of observability tooling and real-time data processing pipelines that surface critical anomalies and performance bottlenecks.
- • Partner with infrastructure teams to understand the unique challenges of AI/ML training workloads and translate them into technical product requirements.
- • Iterate rapidly in a fast-paced, high-collaboration environment with evolving priorities and tight timelines to deliver impact on mission-critical systems.
🎯 Requirements
- • Significant experience in full-stack development using modern frontend frameworks such as React, Vue, or Angular and backend technologies such as Python, Go, or Node.js.
- • Proven track record of building scalable, high-performance web applications for complex distributed systems.
- • Strong understanding of APIs, distributed data systems, and cloud infrastructure.
- • Execution-focused mindset with a strong emphasis on usability, performance, and scalability.
- • Comfort working in fast-paced, highly collaborative environments with evolving priorities and tight deadlines.
🏖️ Benefits
- • Compensation range of $230K - $347K USD.
- • Opportunity to work on the largest supercomputers in the world supporting frontier AI model training.
- • Collaborative environment with researchers and infrastructure teams at the cutting edge of AI.
- • Access to secure and protected information technology systems with associated data security obligations.
- • Reasonable accommodations available for applicants with disabilities.
- • Equal opportunity employer committed to diversity and inclusion across all protected characteristics.
Skills & Technologies
About OpenAI, Inc.
OpenAI is a San Francisco-based artificial intelligence research and deployment company founded in 2015. It develops large-scale AI models such as GPT, DALL-E, and Codex, providing cloud APIs and consumer applications like ChatGPT. Originally established as a non-profit, it later created a capped-profit subsidiary to attract capital while maintaining its mission to ensure artificial general intelligence benefits all of humanity.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities
8 months ago
2 months ago

Anyone AI Inc.
13 days ago


