
Job Overview
Location
Barcelona, Spain
Job Type
Full-time
Category
Software Engineering
Date Posted
March 5, 2026
Full Job Description
đź“‹ Description
- • As a Senior Site Reliability Engineer (SRE) at Auth0, a part of Okta, Inc., you will play a pivotal role in ensuring the unwavering reliability, resilience, and scalability of our world-class identity platform. This is a unique opportunity to join a dynamic European-based SRE team and contribute directly to the core robustness of a system trusted by hundreds of millions of users globally. Your responsibilities will extend beyond traditional operations; you will be a hands-on builder, architecting and implementing custom software solutions that proactively enhance the platform's reliability by design.
- • You will be instrumental in designing and developing custom software, primarily using Go, to bolster the platform's reliability, resilience, and redundancy. This involves creating sophisticated tools and systems that automate critical processes, detect potential issues before they impact users, and ensure seamless failover and recovery mechanisms.
- • A key aspect of your role will be to collaborate closely with various engineering teams across the organization. You will act as a reliability advocate, embedding SRE principles and best practices into their development lifecycles. This partnership will focus on measurably improving the availability, performance, and observability of our diverse range of services, ensuring they meet and exceed stringent Service Level Objectives (SLOs).
- • Leveraging your profound understanding of infrastructure and observability principles, you will actively identify opportunities for enhancement within the product and its underlying infrastructure. This includes implementing cutting-edge solutions for monitoring, alerting, logging, and tracing, enabling deeper insights into system behavior and facilitating faster, more effective incident response.
- • You will be an integral part of our on-call rotation, a critical function for maintaining 24/7 service availability. In this capacity, you will provide rapid and effective responses to critical incidents, utilizing your deep technical expertise to troubleshoot complex issues, implement immediate mitigations, and accurately escalate problems when necessary. Post-incident, you will contribute to blameless post-mortems and drive the implementation of preventative measures.
- • A significant part of your contribution will involve the continuous development and refinement of our SRE tooling and processes. Your focus will be on driving automation to reduce manual toil, increase operational efficiency, and empower the engineering teams with self-service capabilities. This includes improving CI/CD pipelines, infrastructure provisioning, and deployment strategies.
- • You will be responsible for defining, documenting, and championing reliability best practices across the entire organization. This involves creating clear guidelines, conducting training sessions, and fostering a culture where reliability is a shared responsibility, from initial design to ongoing operation.
- • The ideal candidate possesses a proactive and systematic approach to problem-solving, demonstrating a high degree of ownership and accountability for the systems they manage. You should be comfortable working with a high degree of autonomy in a production environment, supporting large-scale, mission-critical applications.
- • Your technical acumen will be applied to understanding the intricate workings of microservices architectures, various database technologies (both SQL and NoSQL), and fundamental networking concepts. This comprehensive understanding is crucial for identifying how custom-developed software can effectively address and resolve platform-level challenges.
- • You will be expected to contribute to the definition and refinement of Service Level Indicators (SLIs) and Service Level Objectives (SLOs), and to actively manage error budgets, ensuring that performance and availability targets are consistently met.
- • Exceptional communication and collaboration skills are paramount, as you will be working within a remote, distributed team environment. The ability to articulate complex technical concepts clearly and to work effectively with colleagues across different time zones and functions is essential for success.
- • This role offers a career-defining opportunity to tackle complex, high-impact challenges at a massive scale, directly influencing the reliability and success of Auth0's identity platform. If you are a curious, motivated, and passionate engineer dedicated to building reliability into the very fabric of a platform, we encourage you to apply.
Skills & Technologies
Go
AWS
Azure
GCP
Docker
Senior
Remote
About Okta, Inc.
Okta provides cloud-based identity and access management software that enables organizations to securely connect employees, partners, and customers to the right technologies. Its platform offers single sign-on, multi-factor authentication, lifecycle management, API access control, and analytics to manage user identities across applications, devices, and networks. The company serves enterprises, government agencies, and small to medium-sized businesses, helping them improve security, compliance, and user experience while reducing IT complexity and support costs.
Similar Opportunities

Coinbase Global, Inc.
Remote - Canada
Full-time
Expires May 2, 2026
Go
MongoDB
Redis
+3 more
4 days ago

