Post-Training Research Engineer

BaseTen Inc.

Job Overview

Location

San Francisco

Job Type

Full-time

Full Job Description

📋 Description

• As a Post-Training Research Engineer at Baseten, you will play a critical role in advancing the performance and efficiency of custom AI models used by leading AI-driven companies such as Notion, Cursor, and Writer. Your work will directly impact how these organizations deploy state-of-the-art models in production by developing innovative post-training techniques that improve model quality, reduce latency, and lower inference costs—ensuring Baseten remains the preferred platform for mission-critical AI inference.
• You will design, implement, and optimize in-house tooling for post-training workflows, including supervised fine-tuning, reinforcement learning, distillation, and other emerging techniques from the research literature. This involves building scalable systems that integrate with PyTorch-based distributed training pipelines, profiling GPU utilization, identifying bottlenecks, and improving throughput across heterogeneous hardware environments.
• Your day-to-day responsibilities will include conducting experiments to evaluate training strategies, analyzing performance metrics using roofline analysis and profiling tools, collaborating with research scientists to translate theoretical concepts into practical implementations, and contributing to Baseten’s internal research publications that showcase novel advancements in model efficiency and training stability.
• You will work across the full stack—from low-level GPU kernel optimization and memory management (e.g., KV cache innovations) to high-level orchestration using Kubernetes, Slurm, or Ray—ensuring that training jobs run reliably and efficiently at scale on GPU-accelerated infrastructure.
• The Post-Training team operates at the intersection of systems engineering and applied ML research, fostering a culture of deep technical curiosity, rigorous experimentation, and cross-functional collaboration. You’ll join a team that values first-principles thinking and encourages engineers to question assumptions, propose alternative approaches, and drive innovation through evidence-based development.
• In this role, you will gain hands-on experience with cutting-edge techniques in transformer training parallelism (data, tensor, pipeline, context, and sharded data parallelism), distributed systems debugging, and performance optimization for large-scale AI workloads. You’ll also develop expertise in HPC networking technologies like InfiniBand and GPUDirect, containerization, and kernel-level systems concepts—skills that are highly transferable across advanced computing domains.
• Beyond technical growth, you will have the opportunity to publish research, present findings internally and externally, and influence the technical direction of a rapidly growing AI infrastructure company backed by top-tier investors. Your contributions will help shape how the next generation of AI products are built and deployed.

🎯 Requirements

• Deep understanding of modern machine learning techniques and tools for training transformer-based models, including hands-on experience with PyTorch, TensorFlow, JAX, or similar tensor computation libraries.
• Advanced knowledge of transformer training parallelism strategies such as data parallelism, tensor parallelism, pipeline parallelism, context parallelism, and sharded data parallelism (e.g., FSDP), with practical experience implementing or optimizing these in distributed settings.
• Proven ability to profile, analyze, and improve the performance of distributed GPU programs using tools like NVIDIA Nsight, PyTorch Profiler, or custom instrumentation, including conducting roofline analysis to identify compute vs. memory bottlenecks.
• Familiarity with high-performance computing (HPC) and distributed computing platforms including Kubernetes, Slurm, Ray, or Dask, along with experience in cluster networking technologies such as InfiniBand, RoCE, or GPUDirect.
• Solid foundation in operating systems concepts including processes, file systems, kernel drivers, containerization (e.g., Docker, containerd), and networking protocols, with the ability to debug system-level issues in Linux-based environments.
• Strong problem-solving mindset, creativity, and willingness to engage with ambiguous challenges—comfortable deriving specifications through dialogue with researchers and iterating rapidly based on experimental results.

🏖️ Benefits

• Competitive compensation package including meaningful equity ownership, aligning your success with Baseten’s long-term growth and impact in the AI infrastructure space.
• 100% coverage of medical, dental, and vision insurance for employees and their dependents, ensuring comprehensive healthcare support with no out-of-pocket premium costs.
• Generous paid time off (PTO) policy, including a company-wide Winter Break when offices are closed from Christmas Eve to New Year’s Day, promoting rest and work-life balance.
• Paid parental leave to support employees during significant life events, reflecting Baseten’s commitment to family-friendly workplace practices.
• Company-facilitated 401(k) retirement plan with potential matching, helping employees build long-term financial security.
• Unique exposure to a diverse portfolio of innovative ML startups (including Notion, Cursor, Abridge, and Writer), providing unparalleled opportunities for learning, networking, and staying at the forefront of applied AI innovation.

Skills & Technologies

Kubernetes

TensorFlow

PyTorch

Apache Spark

Onsite

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

BaseTen Inc.

Visit Website

About BaseTen Inc.

BaseTen provides a serverless, GPU-accelerated platform that lets machine-learning teams deploy, scale and monitor custom models behind autoscaling inference endpoints. The service abstracts infrastructure management, supports PyTorch, TensorFlow and Hugging Face artifacts, and offers built-in observability, A/B testing and fine-tuning. Customers integrate via REST or GraphQL APIs and pay only for compute used. Founded in 2019 and headquartered in San Francisco, BaseTen targets data scientists and product teams seeking production-grade ML serving without Kubernetes complexity.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.