
Job Overview
Location
SF Office
Job Type
Full-time
Category
Software Engineering
Date Posted
May 22, 2026
Full Job Description
đź“‹ Description
- • Lead and grow a high-performing team of AI inference engineers focused on building and scaling infrastructure for Abridge’s products and APIs
- • Own the technical direction of inference systems, making key decisions around batching, throughput, latency, and GPU utilization
- • Architect and scale inference infrastructure for reliability, efficiency, and observability; lead incident response for production systems
- • Benchmark and eliminate bottlenecks throughout the entire inference stack, from model loading to API response
- • Partner with ML Research teams on model optimization techniques including quantization, kernel fusion, and deployment strategies
- • Develop and maintain APIs for AI inference used by both internal teams and external customers
- • Recruit, mentor, and develop engineering talent; establish team processes, engineering standards, and operational excellence
- • Work closely with the GenAI Platform, Data, and Product teams to plan and execute projects that directly impact clinicians and patients
- • Ensure systems underpinning every clinician interaction operate at peak efficiency and reliability with real-time performance requirements
- • Implement parallelism strategies including tensor parallelism, pipeline parallelism, and expert parallelism to maximize inference throughput
- • Apply deep understanding of LLM architecture—including Multi-Head Attention, Multi-Grouped-Query Attention, and transformer components—to optimize inference performance
- • Utilize inference frameworks such as PyTorch, TensorRT, vLLM, and TensorFlow to deploy and scale production-grade AI systems
- • Maintain familiarity with GPU characteristics, roofline models, and performance analysis to guide infrastructure decisions
- • Deploy and manage distributed, real-time systems at scale in a high-availability healthcare environment
- • Contribute to secure, compliant systems on major cloud platforms, with preference for GCP and experience with AWS
- • Operate within a fast-growing startup environment with urgency, focus, and extreme ownership over outcomes
- • Provide constructive feedback during technical design reviews and code reviews to elevate team standards
- • Drive adoption of inference optimizations such as FlashAttention and quantization to reduce latency and improve cost efficiency
- • Maintain auditable, reliable systems that align with healthcare industry standards for AI transparency and trust
🎯 Requirements
- • 5+ years of engineering experience with 1+ years in a technical leadership or management role
- • Deep, hands-on experience with ML systems and inference frameworks (e.g., PyTorch, TensorRT, vLLM, TensorFlow)
- • Strong understanding of LLM architecture (e.g., Multi-Head Attention, Multi/Grouped-Query Attention, and common transformer components)
- • Experience with inference optimizations (e.g., batching, quantization, kernel fusion, FlashAttention)
- • Familiarity with GPU characteristics, roofline models, and performance analysis
- • Experience deploying reliable, distributed, real-time systems at scale
🏖️ Benefits
- • Generous Time Off: 14 paid holidays, flexible PTO for salaried employees, and accrued time off for hourly employees
- • Comprehensive Health Plans: Medical, Dental, and Vision coverage for all full-time employees and their families
- • Generous HSA Contribution: Monthly contributions to your HSA if you choose a High Deductible Health Plan
- • Paid Parental Leave: Generous paid parental leave for all full-time employees
- • 401(k) Matching: Contribution matching to help invest in your future
- • Personal Device Allowance: Tax-free funds for personal device usage
Skills & Technologies
About Abridge AI, Inc.
Abridge AI provides AI-powered clinical documentation solutions that automatically generate structured notes from patient-clinic conversations. The platform captures, transcribes, and summarizes encounters in real time, integrating with Epic and other EHR systems to reduce clinician administrative burden and improve documentation accuracy.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities
28 days ago

PAE Holding Corporation, LLC
2 days ago

Siftstack Inc.
2 months ago

ICF International, Inc.
2 months ago

