
Job Overview
Location
Remote
Job Type
Full-time
Category
Data Engineer
Date Posted
February 12, 2026
Full Job Description
đź“‹ Description
- • As a Senior Staff Software Engineer on the Data Platform team at Perplexity, you will be at the forefront of building and scaling the critical data infrastructure that underpins our rapidly growing AI-powered answer engine. This is a pivotal role where you will not only design and implement robust data systems but also shape the technical direction and set the standards for our entire data ecosystem. You will be instrumental in ensuring the accuracy, timeliness, discoverability, and trustworthiness of data, empowering our product, AI research, analytics, and decision-making capabilities at an unprecedented scale.
- • Your primary responsibility will be the design, development, and operation of large-scale batch and streaming data pipelines. These pipelines are the lifeblood of Perplexity, supporting everything from core product features and AI model training evaluation to in-depth analytics and experimentation. You will architect and build event-driven and streaming systems, leveraging technologies akin to Kafka, Kinesis, or Pub/Sub, to handle real-time data ingestion, transformation, and delivery. Furthermore, you will own and refine our batch processing frameworks, ensuring efficient and reliable execution of backfills, complex aggregations, and offline computations, which are vital for historical analysis and model development.
- • A significant aspect of this role involves leading the design and operation of our data orchestration systems. You will be responsible for implementing and maintaining robust scheduling, dependency management, retry mechanisms, and Service Level Agreements (SLAs) to ensure the smooth and predictable flow of data. Establishing strong guarantees around data correctness, freshness, lineage, and recoverability will be paramount. You will architect systems capable of handling immense scale, gracefully managing partial failures, and adapting to evolving data schemas, ensuring the resilience and integrity of our data pipelines.
- • Beyond pipeline and orchestration, you will focus on building a self-serve data platform that empowers our engineering, data science, and analyst teams. This involves creating intuitive abstractions, developing advanced tooling, providing comprehensive documentation, and establishing 'paved paths' that enable these teams to safely and efficiently create, operate, and iterate on their data workflows. Your efforts will significantly improve the developer experience for all data-related work across the company.
- • You will play a key role in setting and enforcing standards for data modeling, testing, validation, and deployment practices. This includes defining best practices for data quality, ensuring data lineage is meticulously tracked, implementing effective observability solutions, and contributing to data governance initiatives. Your leadership will ensure consistency and reliability across all data assets.
- • As a Senior Staff Engineer, you will drive architectural decisions for our data infrastructure, encompassing storage solutions, compute frameworks, orchestration tools, and API design. You will collaborate closely with engineering and data science leadership to ensure our data systems align perfectly with the evolving strategic requirements of the business and our AI research endeavors.
- • Mentoring junior engineers, conducting thorough design reviews, and actively contributing to raising the overall technical bar within the organization are integral parts of this role. You will be a technical leader, guiding the team through complex challenges and fostering a culture of technical excellence and continuous improvement.
- • This role offers a unique opportunity to work with cutting-edge AI technologies and contribute to a product that is fundamentally changing how people access and process information. You will have a direct impact on the success of Perplexity by building the foundational data systems that enable our ambitious goals.
- • You will be expected to tackle complex problems with a strong systems thinking approach, understanding and balancing the trade-offs between reliability, latency, cost, and overall system complexity. Your ability to design for scale and resilience will be critical to our continued growth and success.
- • Experience supporting Machine Learning and AI workflows, including training pipelines and evaluation systems, is highly desirable, as our data platform directly fuels these critical areas of our business.
- • Prior experience owning and operating internal platforms that are widely adopted by multiple engineering teams is a significant advantage, demonstrating your ability to build impactful and user-friendly data solutions.
🎯 Requirements
- • Minimum 5 years (Senior) or 8 years (Staff) of software engineering experience, with a strong focus on building production data infrastructure systems.
- • Hands-on experience designing, building, and operating large-scale batch and/or streaming data processing systems.
- • Deep familiarity with data orchestration systems such as Airflow, Dagster, or equivalent, including their operational aspects.
- • Proficiency in Python and at least one additional backend programming language (e.g., Go, TypeScript).
- • Strong systems thinking capabilities, with a proven ability to understand and manage trade-offs across reliability, latency, cost, and complexity.
- • Experience supporting ML/AI workflows, training pipelines, or evaluation systems is a significant plus.
🏖️ Benefits
- • Competitive salary and equity package.
- • Comprehensive health, dental, and vision insurance.
- • Generous paid time off and holidays.
- • Remote-first work environment with flexibility.
- • Opportunity to work on cutting-edge AI technology with a high-impact product.
Skills & Technologies
Python
TypeScript
Kafka
Senior
Remote
About Perplexity AI, Inc.
Perplexity AI operates an AI-powered conversational search engine that answers queries by synthesizing live web information. The platform combines large language models with real-time retrieval, citing sources for transparency. Founded in 2022, the San Francisco-based company offers free and subscription tiers, mobile apps, and browser extensions, targeting consumers and enterprises seeking accurate, verifiable answers instead of traditional link lists.



