This job has expired

This position was posted on November 21, 2025 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Multimodal AI Engineer, Document Understanding

LlamaIndex, Inc.

Job Overview

Location

San Francisco

Job Type

Full-time

Full Job Description

📋 Description

• Architect and ship the next generation of document intelligence. You will own the full ML lifecycle—from curating multimodal datasets that span PDFs, PowerPoints, Word docs, and spreadsheets to deploying low-latency, high-accuracy models that power LlamaParse and LlamaExtract for thousands of developers worldwide.
• Design and train vision-language models that understand complex layouts, extract nested tables, preserve reading order, and reason over charts and figures. Your models will run against millions of real-world documents, so robustness to noise, rotation, and mixed content is as important as headline accuracy.
• Build bullet-proof data pipelines and evaluation frameworks that surface edge cases before users do. Expect to create synthetic augmentations, active-learning loops, and human-in-the-review tooling that keep our gold-standard datasets growing and our benchmarks honest.
• Push the frontier of multimodal research while keeping one foot firmly in production. You will read papers Monday, prototype Tuesday, benchmark Wednesday, and ship a canary release Thursday—then iterate with customer feedback on Friday.
• Collaborate cross-functionally with API, infrastructure, and product teams to turn research wins into customer value. You’ll write the Pydantic schemas, FastAPI endpoints, and Kubernetes manifests that let a new extraction feature go live for 10× more traffic overnight.
• Contribute directly to our open-source stack—whether that’s a new LlamaIndex reader, a layout-aware chunking strategy, or a vision encoder that plugs seamlessly into RAG pipelines. Your GitHub profile becomes a recruiting magnet for the broader community.
• Champion reproducibility and MLOps best practices: experiment tracking with MLflow, model registries, CI/CD for training jobs, and automated rollback when AUC drops. We treat model drift like production incidents.
• Mentor junior engineers and engage with external researchers through papers, talks, and hackathons. Your thought leadership helps define how the industry thinks about document understanding for years to come.
• Enjoy deep technical autonomy: pick your stack (LoRA vs. full fine-tuning, vLLM vs. TensorRT), choose your research bets, and see them ship to production. We bias toward impact, not micromanagement.

🎯 Requirements

• 3–7 years shipping ML models to production, with demonstrable experience in computer vision, NLP, or multimodal learning
• Expert-level Python and modern tooling: uv, ruff, mypy, Pydantic, FastAPI, Docker, Kubernetes
• Hands-on experience fine-tuning transformer models (LoRA, QLoRA) and serving them at scale (vLLM, TensorRT, ONNX)
• Solid grasp of data-centric AI: building evaluation suites, curating noisy real-world datasets, and automating quality checks
• Nice-to-have: prior work in document understanding, OCR, layout analysis, or RAG systems; open-source contributions; publications in vision-language or document AI venues

🏖️ Benefits

• Competitive base salary plus meaningful equity in a fast-growing AI startup
• Comprehensive medical, dental, and vision coverage for you and your family
• Unlimited PTO and a hybrid-friendly culture anchored in downtown San Francisco with daily catered lunches and snacks
• Annual budget for conferences, research materials, and professional development, plus access to cutting-edge compute (A100/H100 clusters)

Skills & Technologies

Python

Docker

Kubernetes

Data Science

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

LlamaIndex, Inc.

Visit Website

About LlamaIndex, Inc.

LlamaIndex, Inc. provides a platform for building AI agents that redefine document workflows. Their tools parse, extract, and index data, offering solutions for persona engineering, R&D, and administrative operations across industries like finance, insurance, manufacturing, and healthcare. LlamaIndex empowers financial analysts, streamlines business processes, and optimizes system uptime through AI-powered models. They serve a diverse range of clients, including Jeppesen (a Boeing Company), who saved approximately 2,000 engineering hours using their unified chat framework. LlamaIndex supports developers with resources and building blocks for AI agents, offering both Python and Typescript SDKs.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.