This job has expired

This position was posted on May 6, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

AI Engineer - Model Performance

Fathom Technologies Inc.

Job Overview

Location

SF Hybrid

Job Type

Full-time

Full Job Description

📋 Description

• As an AI Engineer - Model Performance at Fathom Technologies Inc., you will own the speed, cost, and reliability of the company's model inference stack and build fine-tuning infrastructure to accelerate the AI team's workflow, directly impacting the performance of an AI meeting assistant used by millions.
• Day to day, you will benchmark quantization strategies across GPU families, evaluate serving frameworks like vLLM and SGLang with speculative decoding, build repeatable fine-tuning pipelines from JSONL datasets to deployable models, optimize GPU spend by selecting appropriate hardware for latency-sensitive vs batch workloads, and debug production inference issues such as quality regressions from framework updates or audio pipeline bugs.
• Fathom Technologies Inc. is a small, focused team building an AI-powered meeting assistant that eliminates note-taking overhead by capturing, summarizing, and organizing call moments, with proven traction including #1 rankings on HubSpot, G2, and Product Hunt, and rapid growth in usage and revenue.
• In this role, you will gain deep expertise in LLM serving optimization, fine-tuning infrastructure, and GPU performance analysis while enabling faster experimentation and deployment across the AI team, directly contributing to product reliability and cost efficiency at scale.

🎯 Requirements

• Deep experience with LLM serving frameworks (vLLM, SGLang, TensorRT-LLM, or similar), including tuning attention backends, scheduling strategies, CUDA graph warmup, and prefix caching
• Hands-on quantization experience with understanding of weight vs activation quantization, per-channel vs per-tensor scaling, and trade-offs of dynamic quantization
• Production fine-tuning experience with LoRA/QLoRA SFT and familiarity with frameworks like ms-swift, Axolotl, or torchtune, including data formatting, learning rate schedules, and diagnosing training failures
• Strong Python skills for writing serving infrastructure, benchmarking harnesses, and training pipelines
• Comfort with GPU profiling and performance analysis to identify bottlenecks in compute, memory bandwidth, or scheduling

🏖️ Benefits

• Opportunity to shape the foundational software services of a growing company
• Dynamic and collaborative engineering team
• Competitive compensation and benefits
• Supportive environment that encourages innovation and personal growth
• Fully remote work with async-first communication (Slack, Notion, Loom)

Skills & Technologies

Python

Swift

GitHub

REST

Data Science

Remote

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

Fathom Technologies Inc.

Visit Website

About Fathom Technologies Inc.

Fathom Technologies provides AI meeting recording and transcription software that automatically captures, summarizes, and indexes Zoom, Google Meet, and Microsoft Teams calls. The platform generates searchable transcripts, highlights key moments, and delivers concise summaries, enabling teams to revisit discussions without re-watching entire recordings. Features include secure cloud storage, CRM integrations, real-time note-taking, and role-based sharing controls. Targeting sales, customer success, and remote teams, Fathom offers free and paid tiers, emphasizing privacy compliance and rapid deployment.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.