
Job Overview
Location
Remote - Europe
Job Type
Full-time
Category
Software Engineering
Date Posted
May 22, 2026
Full Job Description
đź“‹ Description
- • Design and implement comprehensive observability solutions using modern tools to provide real-time visibility into system health, performance, and reliability across Replit’s global infrastructure.
- • Architect and maintain infrastructure automation systems using Terraform, Ansible, or Pulumi to enable consistent, repeatable, and scalable deployments.
- • Develop and manage CI/CD pipelines that ensure reliable, secure, and efficient software releases while minimizing manual intervention.
- • Define, implement, and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) in collaboration with product and engineering teams to balance innovation speed with system reliability.
- • Lead incident response efforts for critical system outages, conduct thorough post-mortems, and implement preventive measures to reduce recurrence and improve Mean Time To Recovery (MTTR).
- • Create and maintain runbooks for critical services to standardize response procedures and enable rapid resolution during operational emergencies.
- • Identify and resolve performance bottlenecks across distributed systems, optimizing latency, resource utilization, and efficiency across global regions.
- • Implement capacity planning strategies to anticipate growth and ensure infrastructure can scale reliably under increasing user demand.
- • Build self-healing systems that automatically detect, respond to, and recover from common failure scenarios without human intervention.
- • Establish robust logging and alerting strategies that enable quick identification, diagnosis, and resolution of issues across Replit’s platform serving millions of developers.
- • Collaborate with development teams to embed reliability practices into the software lifecycle, promoting a culture of ownership and operational excellence.
- • Continuously improve infrastructure resilience by applying industry best practices in cloud-native technologies, distributed systems, and SRE methodologies.
- • Maintain and enhance monitoring dashboards and metrics that inform engineering decisions and provide transparency into system performance.
- • Advocate for automation as a core principle, eliminating manual toil and reducing operational overhead through tooling and process improvements.
- • Communicate complex technical concepts clearly to both technical and non-technical stakeholders to align teams on reliability goals and operational priorities.
🎯 Requirements
- • 4-8 years of experience in Site Reliability Engineering or similar roles (DevOps, Systems Engineering, Infrastructure Engineering)
- • Strong programming skills in languages commonly used for automation (Python, Go, or similar)
- • Deep understanding of distributed systems
- • Experience with container orchestration platforms (Kubernetes) and cloud-native technologies
- • Proven track record of implementing and maintaining monitoring/observability solutions
- • Strong incident management skills with experience leading incident response
- • Experience with infrastructure as code and configuration management tools
🏖️ Benefits
- • Competitive Salary & Equity
- • Health, Dental, Vision and Life Insurance
- • Short Term and Long Term Disability
- • Paid Parental, Medical, Caregiver Leave
- • Flexible Time Off (FTO) + Holidays
- • Monthly Wellness Stipend
- • Autonomous Work Environment
- • Quarterly Team Gatherings
Skills & Technologies
About Replit, Inc.
Replit is an online, collaborative, integrated development environment (IDE) that allows users to write, run, and share code in numerous programming languages directly from their web browser. It provides a cloud-based platform, eliminating the need for local setup and dependencies. Replit supports real-time collaboration, enabling multiple users to code together simultaneously on the same project, making it ideal for educational purposes, team projects, and rapid prototyping. The platform offers a vast array of features including version control integration, package management, and deployment tools, democratizing software development for beginners and experienced programmers alike.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities
27 days ago

PAE Holding Corporation, LLC
23 hours ago

Siftstack Inc.
2 months ago

ICF International, Inc.
2 months ago
