PLAUD AI INC. logo

Machine Learning Engineer, Inference & Serving (Speech LLM) - San Francisco

Job Overview

Location

San Francisco, CA

Job Type

Full-time

Category

Machine Learning Engineer

Date Posted

May 8, 2026

Full Job Description

📋 Description

  • Machine Learning Engineer, Inference & Serving (Speech LLM) role focused on building and deploying high-throughput, ultra-low-latency inference engines for large language and speech models to power Plaud’s AI work companion used by over 1.5M users globally.
  • Day-to-day responsibilities include optimizing latency, throughput, and Time-To-First-Token in real-time streaming environments, implementing continuous batching and KV cache management (e.g., PagedAttention), and working with GPU architectures (NVIDIA Ampere/Hopper) to eliminate hardware bottlenecks.
  • Plaud is a bootstrapped, profitable, San Francisco-based AI company with $250M revenue run rate, SOC 2, HIPAA, GDPR, ISO27001 compliant, building trusted AI work companions through hardware-software integration to capture and utilize human intelligence from speech, audio, and thought.
  • The role offers the opportunity to join the founding SpeechLLM lab, work at the intersection of ML training and backend infrastructure, gain exposure to cutting-edge AI serving techniques, and grow in a culture of continuous learning, innovation, and fast career development with global impact.

🎯 Requirements

  • Hands-on experience building and deploying high-throughput, ultra-low-latency inference engines for large language models or foundational speech models
  • Understanding of tradeoffs between latency, throughput, and Time-To-First-Token in real-time streaming environments
  • Practical experience with continuous batching, KV cache management (e.g., PagedAttention), and stateful connections for real-time conversational AI
  • Deep understanding of GPU architectures (NVIDIA Ampere/Hopper) and memory hierarchy to identify and eliminate hardware bottlenecks
  • Ability to communicate clearly and collaborate effectively between ML training and backend infrastructure teams
  • Experience with frontier serving frameworks like vLLM, TensorRT-LLM, SGLang, or NVIDIA Triton Inference Server (nice-to-have)

🏖️ Benefits

  • Competitive compensation: $180K–$270K base salary + performance bonus + equity
  • Comprehensive benefits: top-tier healthcare (medical, dental, vision) with employer subsidy
  • Retirement planning: 401(k) plan with company matching
  • Paid time off: unlimited PTO plus 13 paid holidays
  • New parent leave: 12 weeks of paid time off regardless of gender
  • Hybrid office: minimum 3x in-office per week; gear perks include choice of top-of-the-line laptops/workstations

Skills & Technologies

Node.js
Kubernetes
Hybrid
$180k-270k

Ready to Apply?

You will be redirected to an external site to apply.

PLAUD AI INC. logo
PLAUD AI INC.
Visit Website

About PLAUD AI INC.

PLAUD AI INC. builds AI-powered voice and note-taking hardware. Its flagship Plaud Note records phone calls and meetings, transcribes them in real time, and generates summaries using GPT-4o. The credit-card-sized device attaches to iPhone or Android, stores encrypted audio locally or in the cloud, and integrates with Notion, Slack, and Google Docs. Founded in 2023 and based in San Francisco, the company sells direct to consumers and enterprises through plaud.ai, offering subscription plans for advanced AI features and multi-language support.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

ARGENTINA
Full-time
Expires Jun 20, 2026
AWS
Terraform
TensorFlow
+4 more

22 days ago

Apply
Qualysoft GmbH logo

Qualysoft GmbH

Bucharest
Full-time
Expires Jun 22, 2026
Data Science
Senior
Onsite

20 days ago

Apply
⏰ EXPIRES SOON
Melbourne
Full-time
Expires May 15, 2026 (Soon)
Python
Kubernetes
PyTorch
+4 more

2 months ago

Apply
⏰ EXPIRES SOON
Heidi Health Pty Ltd logo

Heidi Health Pty Ltd

Melbourne
Full-time
Expires May 15, 2026 (Soon)
Python
Go
TensorFlow
+4 more

2 months ago

Apply