This job has expired

This position was posted on April 27, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Software Engineer - GPU Inference

BaseTen Inc.

Job Overview

Location

San Francisco

Job Type

Full-time

Full Job Description

📋 Description

• As a Software Engineer - GPU Inference at BaseTen Inc., you will be the primary owner of the Voice AI inference stack, responsible for bringing state-of-the-art open-source models into production for real-time voice applications across industries such as productivity, customer service, clinical conversation, and education.
• Your day-to-day work will involve owning end-to-end product areas — from architecture and system design through implementation, rollout, and long-term production operations — designing and operating real-time, large-scale, high-performance model serving systems for STT, TTS, and voice agent workloads with strict SLOs, driving cross-team collaboration with Forward Deployed Engineers, Model Performance Engineers, and Core Product and Training Platform teams, and mentoring teammates through code reviews, design docs, and technical leadership.
• You will join a small, high-impact founding team focused on Voice AI at Baseten, a rapidly growing AI infrastructure company powering mission-critical inference for innovative AI companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer, backed by top-tier investors including BOND, IVP, Spark Capital, Greylock, and Conviction.
• In this role, you will have the opportunity to shape the future of Voice AI by building the world’s fastest Whisper with streaming and diarization, contributing to TTS inference for models like Orpheus, designing ergonomic APIs and SDKs for self-serve adoption, and enabling continuous training of voice models — all while gaining deep expertise in ML infrastructure, GPU-optimized serving, and production-grade AI systems at scale.

Skills & Technologies

Python

Docker

Kubernetes

PyTorch

Apache Spark

Onsite

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

BaseTen Inc.

Visit Website

About BaseTen Inc.

BaseTen provides a serverless, GPU-accelerated platform that lets machine-learning teams deploy, scale and monitor custom models behind autoscaling inference endpoints. The service abstracts infrastructure management, supports PyTorch, TensorFlow and Hugging Face artifacts, and offers built-in observability, A/B testing and fine-tuning. Customers integrate via REST or GraphQL APIs and pay only for compute used. Founded in 2019 and headquartered in San Francisco, BaseTen targets data scientists and product teams seeking production-grade ML serving without Kubernetes complexity.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.