Principal Site Reliability Engineer

Saviynt Inc.

Job Overview

Location

Vancouver

Job Type

Full-time

Full Job Description

📋 Description

• Design, implement, and maintain scalable, resilient, and high-performance SaaS infrastructure supporting global enterprise customers.
• Lead incident response and post-mortem analysis for critical system outages, driving root cause resolution and preventive automation.
• Develop and enforce SLOs, SLIs, and error budgets to ensure system reliability and measurable service quality.
• Automate deployment, monitoring, and operational workflows using infrastructure-as-code tools and CI/CD pipelines.
• Collaborate with software engineering teams to embed reliability practices into the software development lifecycle.
• Optimize cloud resource utilization and cost efficiency across multi-cloud environments while maintaining performance targets.
• Build and maintain observability systems including logging, metrics, tracing, and alerting to proactively detect and resolve issues.
• Champion site reliability engineering best practices across the organization, influencing architecture decisions and engineering culture.
• Participate in on-call rotations to ensure 24/7 system availability and rapid response to production incidents.
• Conduct capacity planning and performance modeling to anticipate scaling needs and prevent system bottlenecks.
• Partner with security and compliance teams to ensure infrastructure meets enterprise-grade security, audit, and regulatory requirements.
• Mentor junior SREs and engineers, fostering a culture of ownership, learning, and operational excellence.
• Evaluate and integrate new tools, platforms, and technologies to enhance system reliability, scalability, and developer productivity.
• Document operational procedures, runbooks, and system architectures to ensure knowledge sharing and continuity.
• Translate business-critical service requirements into technical specifications and reliability goals for engineering teams.

🎯 Requirements

• Proven experience as a Site Reliability Engineer or similar role in a high-scale SaaS environment
• Expertise in cloud platforms (AWS, Azure, or GCP) and infrastructure-as-code tools (Terraform, Ansible, etc.)
• Strong proficiency in Linux/Unix systems, networking, and containerization (Docker, Kubernetes)
• Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, ELK stack)
• Solid programming/scripting skills in Python, Go, or Bash
• Demonstrated ability to lead incident response and drive post-mortem improvements

🏖️ Benefits

• Competitive compensation package
• Comprehensive benefits program
• Opportunities for professional growth and career advancement
• Annual security training and compliance support

Skills & Technologies

Senior

Onsite

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

Saviynt Inc.

Visit Website

About Saviynt Inc.

Saviynt Inc. provides cloud-native identity governance and administration (IGA) and privileged access management (PAM) platforms. Its solutions automate user provisioning, access requests, separation-of-duties enforcement, and continuous compliance monitoring across hybrid and multi-cloud environments. The company serves financial services, healthcare, retail, energy, and government sectors, helping organizations reduce identity-related risk, pass audits, and accelerate cloud adoption. Founded in 2011 and headquartered in Los Angeles, Saviynt delivers converged identity security through a unified platform that integrates with leading SaaS, IaaS, and on-premises systems.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.