This job has expired

This position was posted on May 8, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Machine Learning Engineer, Model Evaluations (Speech LLM) - San Francisco

PLAUD AI INC.

Job Overview

Location

San Francisco, CA

Job Type

Full-time

Full Job Description

📋 Description

• Plaud Inc. is seeking a highly skilled Machine Learning Engineer specializing in Model Evaluations for Speech LLMs to join our innovative team in San Francisco. This role is pivotal in shaping the future of human-AI interaction by ensuring the quality and performance of our cutting-edge speech technologies. You will be instrumental in transforming subjective concepts related to voice quality and conversational dynamics into objective, measurable metrics that drive research and product development. As a key member of our rapidly growing, profitable company, you will contribute to defining the next generation of intelligence infrastructure and interfaces, amplifying human intelligence through a unique hardware-software combination.
• In this role, you will be responsible for developing and implementing robust evaluation frameworks for Speech LLMs. Your day-to-day activities will involve translating abstract qualities of speech, such as naturalness, expressiveness, and conversational flow, into concrete, quantifiable metrics. You will build and maintain scalable distributed systems and data pipelines essential for running evaluations against live model checkpoints. A significant part of your work will include creating and managing dashboards to monitor model health during training, optimizing signal-to-noise ratios, reducing evaluation latency, and ensuring that performance regressions are immediately identifiable. You will also be tasked with rapidly debugging anomalous mid-training results, diagnosing whether issues stem from model architecture, data corruption, or infrastructure problems. Furthermore, you will collaborate closely with ML researchers to define what constitutes 'good' performance for our Speech LLMs, translating specific capabilities like ASR robustness in noisy environments or TTS emotional steerability into measurable benchmarks. Effective communication of complex statistical findings and model behaviors to both technical and non-technical stakeholders will be a crucial aspect of your responsibilities.
• You will be joining Plaud Inc., a bootstrapped, profitable company that has achieved a $250M revenue run rate in just three years. We are at the forefront of building the next-generation intelligence infrastructure and interfaces designed to capture, extract, and utilize intelligence from spoken, heard, seen, and thought inputs. Our commitment to data security and privacy is underscored by our compliance with SOC 2, HIPAA, GDPR, ISO27001, ISO27701, and EN18031 standards. You will work alongside passionate teammates who are dedicated to innovation, collaboration, and customer success, contributing to a global expansion effort and defining the next paradigm for human-AI interaction.
• This role offers a unique opportunity for significant career growth and learning. You will gain exposure to state-of-the-art AI technologies within the Pro tools domain and play a direct role in our global expansion. You can expect to develop a deep understanding of speech processing, large language models, and evaluation methodologies. The fast-paced environment champions continuous learning and offers rapid career development pathways. By joining as an early, foundational member of our core SpeechLLM lab, you will have meaningful ownership and the chance to make a substantial impact on a fast-growing startup, shaping the future of AI-powered communication tools.
• The ideal candidate possesses a strong passion for quantifying subjective speech characteristics into objective metrics, demonstrating a keen ability to bridge the gap between human perception and machine evaluation. You will leverage your robust software engineering skills, particularly in Python, to build and manage reliable distributed systems and data pipelines that can operate at scale. Your experience in developing evaluation harnesses will be critical for assessing live model checkpoints. A key responsibility will be partnering deeply with ML researchers to establish clear definitions of 'good' for Speech LLMs, translating complex capabilities into measurable benchmarks. You will be adept at building and owning dashboards that provide clear insights into model health, improving evaluation efficiency and preventing performance degradation. The ability to rapidly debug complex issues, identify root causes in model architecture, data, or infrastructure, and communicate findings effectively to diverse audiences is paramount. This role is designed for someone who thrives on tackling challenging problems in a dynamic, innovative environment, contributing directly to the advancement of AI in professional communication.

Skills & Technologies

Python

Hybrid

$180k-270k

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

PLAUD AI INC.

Visit Website

About PLAUD AI INC.

PLAUD AI INC. builds AI-powered voice and note-taking hardware. Its flagship Plaud Note records phone calls and meetings, transcribes them in real time, and generates summaries using GPT-4o. The credit-card-sized device attaches to iPhone or Android, stores encrypted audio locally or in the cloud, and integrates with Notion, Slack, and Google Docs. Founded in 2023 and based in San Francisco, the company sells direct to consumers and enterprises through plaud.ai, offering subscription plans for advanced AI features and multi-language support.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.