Distributed LLM Inference Engineer

Anyscale Inc.

Job Overview

Location

San Francisco

Job Type

Full-time

Full Job Description

📋 Description

• As a Distributed LLM Inference Engineer at Anyscale, you will help build systems and optimizations that push the boundaries of performance for large-scale AI inference, enabling developers to scale machine learning applications from laptops to clusters without needing distributed systems expertise.
• You will iterate quickly with product teams to ship end-to-end batch and online inference solutions for open-source Ray users and Anyscale customers, integrate Ray Data with LLM engines for low-cost, high-scale ML inference, collaborate with the open-source community on tools like vLLM and TensorRT-LLM, and implement state-of-the-art techniques from research and open source.
• Anyscale is commercializing Ray, a widely adopted open-source framework used by companies like OpenAI, Uber, Spotify, Instacart, and Cruise to power scalable AI applications, and is backed by Andreessen Horowitz, NEA, and Addition with over $250M in funding.
• In this role, you will deepen your expertise in distributed systems, ML inference at scale, GPU optimization, and open-source collaboration, while contributing to cutting-edge AI infrastructure that enables real-world AI deployment.

🎯 Requirements

• Familiarity with running ML inference at large scale with high throughput and low latency
• Familiarity with deep learning and deep learning frameworks (e.g. PyTorch)
• Solid understanding of distributed systems and ML inference challenges

🏖️ Benefits

• Stock Options
• Healthcare plans with premiums covered by Anyscale at 99% for employees and dependents
• 401k Retirement Plan
• Education & Wellbeing Stipend
• Paid Parental Leave
• Fertility Benefits
• Paid Time Off
• Commute reimbursement
• 100% of in-office meals covered

Skills & Technologies

TensorFlow

PyTorch

Onsite

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

Anyscale Inc.

Visit Website

About Anyscale Inc.

Anyscale Inc. builds the Ray open-source distributed computing framework and offers a managed platform that lets data scientists and engineers scale machine-learning workloads from laptop to cloud without rewriting code. The company provides serverless infrastructure, observability, and cluster automation so teams can train, tune, and serve models faster. Founded in 2019 by the creators of Ray at UC Berkeley, Anyscale serves Fortune 500 enterprises and AI startups, enabling them to reduce cost and complexity while accelerating production deployment of large-scale AI applications.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.