This job has expired

This position was posted on February 26, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

SRE- Clickhouse Team

PostHog Inc.

Job Overview

Location

Remote (US)

Job Type

Full-time

Full Job Description

📋 Description

• Join PostHog as a Site Reliability Engineer (SRE) focused on our ClickHouse infrastructure, a critical component powering one of the largest self-managed ClickHouse installations on AWS, currently operating at petabyte scale. This is a unique opportunity to be at the forefront of scaling this system for 10-50x growth, moving beyond a traditional "keep the lights on" SRE role to actively transform a rapidly expanding, stateful system into a predictable, highly automated platform.
• You will tackle complex challenges inherent in large-scale data operations, including provisioning, scaling, rebalancing, and recovery of petabyte-scale data. Your primary mission will be to reduce operational stress, design robust automation for data-intensive workloads, and develop the essential tooling and patterns that enable the system to scale efficiently without a linear increase in human effort.
• This role offers significant autonomy and ownership. You will have the space to design, build, and automate solutions, rather than solely reacting to alerts. We encourage proactive problem-solving and expect you to identify recurring pain points, then eliminate them through code, self-healing automation, and strategic infrastructure improvements.
• Key responsibilities include managing vast fleets of EC2-based virtual machines, disks, and networking configurations tailored for data-intensive workloads. You will be instrumental in enhancing operational tooling across critical areas such as deployments, schema changes, backup and restore procedures, and incident response protocols.
• Collaboration is key. You will work closely with our dedicated ClickHouse engineers, translating their database-level requirements into effective infrastructure-level solutions. This cross-functional partnership ensures that our infrastructure is perfectly aligned with the needs of our core data engine.
• A significant part of your role will involve reducing operational load by proactively identifying areas of friction and inefficiency. Through innovative coding and the implementation of self-healing automation, you will create systems that are more resilient and require less manual intervention.
• You will participate actively in our on-call rotation and incident response efforts. However, the emphasis is not just on responding to incidents, but on a strategic focus to make them rarer over time through preventative measures and system hardening.
• This position is ideal for individuals who thrive on deep ownership of production systems and are undeterred by the complexities of working with stateful infrastructure. You will be part of a team that values autonomy, shipping fast, and building for the long term.
• PostHog is a product-led company with over 100,000 installations, driven by a strong product-market fit. We are default alive, experiencing consistent revenue growth, and well-funded to pursue ambitious goals. Our culture is built on transparency, autonomy, rapid shipping, and a touch of healthy weirdness. We believe in empowering engineers to lead product teams and make impactful decisions.
• We operate on a "maker's schedule," prioritizing heads-down building time with a default to async communication (PRs > Issues > Slack). Tuesdays and Thursdays are meeting-free days, ensuring maximum productivity and focus. This environment is designed to be the most productive job you've ever had, allowing you to concentrate on impactful engineering work.
• You will be joining a team of "cracked engineers" – autonomous, highly-efficient individuals who can outship larger companies due to their end-to-end ownership of products. Your contributions will directly impact the reliability, scalability, and performance of a core system that underpins PostHog's entire product suite.
• We are looking for enthusiastic drivers who are optimistic problem solvers and genuine builders. If you enjoy deep ownership, are not afraid of stateful infrastructure, and love working with AWS, automation, and making complex systems reliable, this role is for you.
• You will have the opportunity to work on cutting-edge problems that only emerge at extreme scale, dealing with petabytes of data, thousands of cores, and constant ingestion. This is a chance to shape the future of data infrastructure for a rapidly growing SaaS company.
• Embrace the challenge of turning a fast-evolving, stateful system into a predictable, well-automated platform. Your work will directly contribute to reducing operational stress, enhancing system stability, and enabling unprecedented growth for PostHog.

🎯 Requirements

• Strong experience operating production infrastructure on AWS, with a deep understanding of its core services.
• Hands-on experience with VM-based systems (e.g., EC2), demonstrating proficiency beyond managed Platform-as-a-Service offerings.
• Proven experience automating infrastructure management and deployment using tools such as Terraform, Ansible, or similar configuration management and IaC solutions.
• Solid understanding of Linux systems, including in-depth knowledge of disk I/O, memory management, networking protocols, and common failure modes.
• Demonstrable experience supporting and managing stateful systems, such as databases, message queues, or distributed storage systems.
• Ability to effectively debug and reason about complex performance and reliability issues within a production environment.
• Comfort and willingness to own systems end-to-end, including participation in on-call rotations and incident response.

🏖️ Benefits

• Fully remote role within the US, offering flexibility and work-life balance.
• Competitive salary and equity package, reflecting your experience and contribution.
• Generous PTO and paid holidays to ensure you have time to rest and recharge.
• Comprehensive health, dental, and vision insurance to support your well-being.
• Opportunity to work on challenging, large-scale problems with a talented and passionate team.
• A culture that values transparency, autonomy, and rapid iteration, with a focus on building impactful products.

Skills & Technologies

AWS

Kubernetes

Terraform

Linux

DevOps

Senior

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

PostHog Inc.

Visit Website

About PostHog Inc.

PostHog provides an open-source product analytics platform that lets teams track user behavior, run A/B tests, and gather feedback without sending data to third parties. The self-hosted or cloud service captures events, pageviews, feature flags, and session recordings, then surfaces insights through dashboards, funnels, retention, and cohort analysis. Engineers can instrument code once and non-technical teammates can query results using SQL or visual builders. The company maintains the core project under an MIT license and offers paid tiers for enterprise support, higher volumes, and advanced features such as correlation analysis, data pipelines, and team collaboration tools.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.