This job has expired

This position was posted on February 12, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Engineering Manager - Inference

Perplexity AI, Inc.

Job Overview

Location

Remote

Job Type

Full-time

Full Job Description

📋 Description

• As the Engineering Manager for AI Inference at Perplexity, you will be at the forefront of innovation, leading a critical team responsible for building and scaling the infrastructure that underpins our cutting-edge AI capabilities. This is a unique and exhilarating opportunity to shape the technical direction and execution of our inference systems, directly impacting millions of users who rely on Perplexity's products and APIs for state-of-the-art AI experiences. You will not just manage, but also build and nurture a world-class team of inference engineers, fostering an environment of collaboration, excellence, and continuous learning.
• Our current technology stack is robust and forward-thinking, featuring Python, PyTorch, Rust, C++, and Kubernetes. In this role, you will be instrumental in architecting and scaling the large-scale deployment of machine learning models that power Perplexity's flagship products, including Comet, Sonar, Search, and Deep Research. This is a chance to work on systems that are not only powerful but also designed to be the fastest in the industry, leveraging the latest advancements in AI and hardware.
• Perplexity offers a high-impact work environment where your contributions are visible and valued. Working on a focused, smaller team means you'll have significant ownership and autonomy, allowing you to drive projects from conception to completion. Unlike roles focused on maintaining legacy systems, you will have the unparalleled opportunity to build 0-to-1 infrastructure from scratch, establishing new benchmarks and best practices in AI inference.
• Your responsibilities will span the entire spectrum of inference optimization and deployment. This includes driving down costs associated with running large models, scaling our infrastructure to handle massive traffic surges, and continuously pushing the boundaries of what's possible in AI inference. You will have direct influence on the technical roadmap and the evolving team culture at Perplexity, a company experiencing rapid growth and dynamic innovation.
• Key responsibilities will involve leading and growing a high-performing team of AI inference engineers. This includes setting clear technical goals, providing mentorship, and fostering individual career development. You will be responsible for the design and development of robust APIs for AI inference, serving both internal product teams and external API customers.
• Architecting and scaling our inference infrastructure for optimal reliability and efficiency will be a core focus. This involves anticipating future needs, implementing best practices for distributed systems, and ensuring our systems can handle peak loads with minimal downtime. You will be tasked with benchmarking and systematically eliminating bottlenecks across our entire inference stack, from model loading to request processing.
• A significant part of your role will be driving the inference of large, sparse Mixture-of-Experts (MoE) models at rack scale. This requires a deep understanding of model sharding strategies and distributed computing to efficiently serve models with billions or even trillions of parameters.
• You will be at the forefront of pushing the inference frontier, building systems that support advanced techniques such as sparse attention mechanisms and disaggregated pre-fill decoding serving. This requires staying abreast of the latest research and translating it into practical, scalable solutions.
• Improving the reliability and observability of our inference systems is paramount. You will lead efforts to implement comprehensive monitoring, alerting, and logging, and will be responsible for leading incident response when issues arise, ensuring swift and effective resolution.
• You will own critical technical decisions related to batching strategies, throughput optimization, latency reduction, and maximizing GPU utilization. This requires a data-driven approach and a deep understanding of the trade-offs involved in inference performance.
• Close partnership with our ML research teams will be essential. You will collaborate on model optimization techniques, ensuring that research breakthroughs can be efficiently and effectively deployed into production environments.
• Finally, you will play a key role in recruiting top-tier engineering talent, mentoring team members, and developing a culture of operational excellence, establishing clear team processes and engineering standards that promote quality and efficiency.

Skills & Technologies

Python

Rust

Kubernetes

TensorFlow

PyTorch

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Perplexity AI, Inc.

Visit Website

About Perplexity AI, Inc.

Perplexity AI operates an AI-powered conversational search engine that answers queries by synthesizing live web information. The platform combines large language models with real-time retrieval, citing sources for transparency. Founded in 2022, the San Francisco-based company offers free and subscription tiers, mobile apps, and browser extensions, targeting consumers and enterprises seeking accurate, verifiable answers instead of traditional link lists.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.