Perplexity AI, Inc. logo

Engineering Manager - Inference

Job Overview

Location

Remote

Job Type

Full-time

Category

Engineering Manager

Date Posted

February 12, 2026

Full Job Description

đź“‹ Description

  • • As the Engineering Manager for AI Inference at Perplexity, you will be at the forefront of innovation, leading a critical team responsible for building and scaling the infrastructure that underpins our cutting-edge AI capabilities. This is a unique and exhilarating opportunity to shape the technical direction and execution of our inference systems, directly impacting millions of users who rely on Perplexity's products and APIs for state-of-the-art AI experiences. You will not just manage, but also build and nurture a world-class team of inference engineers, fostering an environment of collaboration, excellence, and continuous learning.
  • • Our current technology stack is robust and forward-thinking, featuring Python, PyTorch, Rust, C++, and Kubernetes. In this role, you will be instrumental in architecting and scaling the large-scale deployment of machine learning models that power Perplexity's flagship products, including Comet, Sonar, Search, and Deep Research. This is a chance to work on systems that are not only powerful but also designed to be the fastest in the industry, leveraging the latest advancements in AI and hardware.
  • • Perplexity offers a high-impact work environment where your contributions are visible and valued. Working on a focused, smaller team means you'll have significant ownership and autonomy, allowing you to drive projects from conception to completion. Unlike roles focused on maintaining legacy systems, you will have the unparalleled opportunity to build 0-to-1 infrastructure from scratch, establishing new benchmarks and best practices in AI inference.
  • • Your responsibilities will span the entire spectrum of inference optimization and deployment. This includes driving down costs associated with running large models, scaling our infrastructure to handle massive traffic surges, and continuously pushing the boundaries of what's possible in AI inference. You will have direct influence on the technical roadmap and the evolving team culture at Perplexity, a company experiencing rapid growth and dynamic innovation.
  • • Key responsibilities will involve leading and growing a high-performing team of AI inference engineers. This includes setting clear technical goals, providing mentorship, and fostering individual career development. You will be responsible for the design and development of robust APIs for AI inference, serving both internal product teams and external API customers.
  • • Architecting and scaling our inference infrastructure for optimal reliability and efficiency will be a core focus. This involves anticipating future needs, implementing best practices for distributed systems, and ensuring our systems can handle peak loads with minimal downtime. You will be tasked with benchmarking and systematically eliminating bottlenecks across our entire inference stack, from model loading to request processing.
  • • A significant part of your role will be driving the inference of large, sparse Mixture-of-Experts (MoE) models at rack scale. This requires a deep understanding of model sharding strategies and distributed computing to efficiently serve models with billions or even trillions of parameters.
  • • You will be at the forefront of pushing the inference frontier, building systems that support advanced techniques such as sparse attention mechanisms and disaggregated pre-fill decoding serving. This requires staying abreast of the latest research and translating it into practical, scalable solutions.
  • • Improving the reliability and observability of our inference systems is paramount. You will lead efforts to implement comprehensive monitoring, alerting, and logging, and will be responsible for leading incident response when issues arise, ensuring swift and effective resolution.
  • • You will own critical technical decisions related to batching strategies, throughput optimization, latency reduction, and maximizing GPU utilization. This requires a data-driven approach and a deep understanding of the trade-offs involved in inference performance.
  • • Close partnership with our ML research teams will be essential. You will collaborate on model optimization techniques, ensuring that research breakthroughs can be efficiently and effectively deployed into production environments.
  • • Finally, you will play a key role in recruiting top-tier engineering talent, mentoring team members, and developing a culture of operational excellence, establishing clear team processes and engineering standards that promote quality and efficiency.

Skills & Technologies

Python
Rust
Kubernetes
TensorFlow
PyTorch
Remote

Ready to Apply?

You will be redirected to an external site to apply.

Perplexity AI, Inc. logo
Perplexity AI, Inc.
Visit Website

About Perplexity AI, Inc.

Perplexity AI operates an AI-powered conversational search engine that answers queries by synthesizing live web information. The platform combines large language models with real-time retrieval, citing sources for transparency. Founded in 2022, the San Francisco-based company offers free and subscription tiers, mobile apps, and browser extensions, targeting consumers and enterprises seeking accurate, verifiable answers instead of traditional link lists.

Similar Opportunities

Sydney
Full-time
Expires Mar 10, 2026

1 month ago

Apply
Canonical Group Limited logo

Canonical Group Limited

Remote
Full-time
Expires Mar 9, 2026
Ubuntu
Remote

1 month ago

Apply
Globe Life Insurance logo

Globe Life Insurance

Remote-TX
Full-time
Expires Mar 10, 2026

1 month ago

Apply
Sydney
Full-time
Expires Mar 10, 2026
Onsite

1 month ago

Apply