This job has expired

This position was posted on February 24, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Principal Site Reliability Engineer

UiPath, Inc.

Job Overview

Location

INDIA : BANGALORE - ENGINEERING

Job Type

Full-time

Full Job Description

📋 Description

• UiPath is at the forefront of automation, transforming how the world works, and we are seeking a Principal Site Reliability Engineer to join our dynamic team in Bangalore, India. This is a pivotal role for an individual who thrives on owning the entire reliability stack, moving beyond siloed responsibilities to architect, scale, measure, and automate our large-scale, cloud-native systems. You will be instrumental in shaping UiPath's reliability strategy, ensuring our platforms are robust, scalable, and performant, directly impacting our ability to deliver transformative automation solutions to our global customer base.
• As a Principal SRE, your mission extends beyond reactive incident management. You will proactively embed reliability into the DNA of our systems and workflows. This involves deep collaboration with engineering and platform teams, fostering a culture of reliability, and elevating our capabilities in observation, automation, and continuous improvement. Your expertise will be crucial in ensuring our systems can reliably handle real-world load and gracefully manage failure conditions, maintaining the trust and satisfaction of our users.
• You will take end-to-end ownership of service reliability, observability, automation, and continuous improvement initiatives. This includes defining and evolving our reliability strategy for complex distributed systems, meticulously balancing availability, performance, development velocity, and cost. A key aspect of this will be the definition and operationalization of clear Service Level Indicators (SLIs) and Service Level Objectives (SLOs), leveraging frameworks like error budgets to ensure reliability efforts are directly aligned with user impact and overarching business goals.
• In the realm of incident response and operational excellence, you will be a leader and a key contributor during high-severity incidents. Your ability to drive structured troubleshooting, even under conditions of ambiguity, will be paramount. More importantly, you will ensure that these incidents lead to durable, systemic improvements, preventing recurrence and strengthening our operational resilience. This proactive approach to learning from incidents is fundamental to our continuous improvement ethos.
• You will champion and define strong observability practices across the organization. This means ensuring that service health, performance risks, and potential issues are not only visible but also actionable. You will be responsible for the implementation and promotion of robust monitoring, logging, and tracing solutions, enabling teams to gain deep insights into their systems' behavior and proactively address potential problems before they impact users.
• Automation, tooling, and engineering rigor are core to this role. You will be tasked with automating manual operational work through the development of sophisticated tooling and self-service capabilities. Applying disciplined engineering practices to our operational tasks will reduce toil, increase efficiency, and enhance overall system reliability. This includes developing and maintaining automation scripts, CI/CD pipelines for reliability-focused deployments, and robust testing frameworks.
• Expertise in infrastructure, cloud, and Infrastructure as Code (IaC) is essential. You will drive the development and maintenance of reliable, scalable cloud infrastructure, leveraging IaC principles and tools. Collaboration with platform teams will be key to establishing and promoting best practices for cloud resource management, deployment strategies, and scaling mechanisms, ensuring our infrastructure is resilient and cost-effective.
• As a Principal Engineer, you will provide technical leadership and drive organizational impact. This involves influencing the adoption of reliability standards, mentoring senior engineers on best practices, and generally elevating the operational reliability across all product and platform teams. Your ability to communicate complex technical concepts effectively and advocate for reliability initiatives will be critical to your success and the success of the teams you support.
• This role offers a unique opportunity to shape the future of reliability at a rapidly growing, category-leading company. You will work with cutting-edge technologies and have a direct impact on the stability and performance of a product used by millions worldwide. If you are a strategic thinker with a passion for building highly reliable systems and influencing engineering culture, we encourage you to apply.

🎯 Requirements

• 7+ years of experience in SRE, platform, cloud, or infrastructure engineering roles with a proven track record of improving reliability for production systems.
• Demonstrated ability to define and operationalize SLIs, SLOs, and use frameworks like error budgets to align reliability with user impact and business goals.
• Strong conceptual understanding of distributed systems, performance bottlenecks, failure modes, and trade-offs inherent to large-scale systems.
• Proficiency in at least one programming language (e.g., Python, Go, or similar) for building automation, internal tooling, and reliability workflows.
• Hands-on experience with one or more major cloud providers (Azure, AWS, GCP), including practical knowledge of networking, deployments, and scaling.
• Experience with Infrastructure as Code (e.g., Terraform, Pulumi) and container orchestration (e.g., Kubernetes) in production environments.
• Proven experience with monitoring/observability stacks (metrics, logs, traces) and building meaningful dashboards and alerts.
• Experience participating in and improving incident response, blameless postmortems, and implementing systemic fixes.
• Ability to partner with product, infrastructure, and engineering teams to influence architecture and reliability practices without direct authority.

🏖️ Benefits

• Competitive salary and stock options.
• Comprehensive health, dental, and vision insurance.
• Generous paid time off and holidays.
• Opportunities for professional development and continuous learning.
• A collaborative and inclusive work environment.
• Flexible work arrangements (hybrid/remote options may be available depending on team and role needs).

Skills & Technologies

Python

AWS

Azure

GCP

Senior

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

UiPath, Inc.

Visit Website

About UiPath, Inc.

UiPath, Inc. is a global software company that develops a platform for robotic process automation (RPA) and artificial intelligence-driven automation. Founded in 2005 and headquartered in New York, it provides tools to build, deploy, and manage software robots that emulate human actions interacting with digital systems and software. The platform includes Studio for design, Orchestrator for management, Robots for execution, and AI fabric for cognitive capabilities. UiPath serves enterprises across finance, healthcare, manufacturing, and public sectors, aiming to accelerate digital transformation by automating repetitive business processes and improving operational efficiency.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.