This job has expired

This position was posted on March 5, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Senior Site Reliability Engineer (Ruby+ DevOps)

Exadel Inc.

Job Overview

Location

Bulgaria, Georgia, Lithuania, Poland, Romania, Uzbekistan

Job Type

Full-time

Full Job Description

📋 Description

• As a Senior Site Reliability Engineer (SRE) at Exadel, you will be instrumental in designing, building, and operating highly reliable, scalable, and resilient distributed systems that power critical client applications. You will join a dynamic global tech company with over 25 years of engineering leadership, working on cutting-edge AI platforms and digital transformation initiatives for Fortune 500 clients like HBO, Microsoft, Google, and Starbucks.
• Your primary focus will be on enhancing system availability, optimizing performance, and strengthening resilience across complex, cloud-native environments. This role demands a proactive approach to identifying and mitigating potential issues before they impact users, ensuring seamless operation of services.
• A significant aspect of your responsibility will involve the automation of infrastructure provisioning, deployment pipelines, and routine operational tasks. By leveraging Infrastructure as Code (IaC) principles and tools, you will reduce manual effort, minimize errors, and accelerate delivery cycles, freeing up valuable engineering time for more strategic initiatives.
• You will be a key player in diagnosing and resolving production incidents, requiring a calm, analytical, and decisive approach under pressure. This includes participating in on-call rotations and leading incident response efforts to restore service quickly and efficiently.
• A critical part of the role involves leading complex upgrades and system migrations, with a strong emphasis on achieving minimal or zero downtime. This requires meticulous planning, thorough testing, and expert execution to ensure business continuity.
• You will collaborate closely with software development teams to embed operability into the design and architecture of new features and services. This partnership ensures that systems are built with reliability, scalability, and maintainability as core tenets from the outset.
• Driving best practices in monitoring, alerting, and capacity planning will be essential. You will implement and refine sophisticated monitoring solutions to gain deep insights into system health, proactively identify performance bottlenecks, and ensure adequate resources are available to meet demand.
• A core SRE principle you will champion is the reduction of operational toil through automation. By identifying repetitive tasks and developing automated solutions, you will significantly improve team efficiency and engineer satisfaction.
• You will actively contribute to the continuous improvement of our reliability posture. This includes participating in incident management processes, conducting thorough post-mortems to learn from failures, developing and testing disaster recovery strategies, and implementing proactive measures to enhance overall system robustness.
• The role requires a deep understanding of distributed systems, microservices architectures, and the challenges associated with operating them at scale. You will apply your expertise to ensure these complex systems are robust, fault-tolerant, and performant.
• You will be expected to contribute to defining and tracking Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs), ensuring that systems meet agreed-upon reliability targets.
• This position offers the opportunity to work on a mobility platform designed to revolutionize transit, enabling operators, regulators, service providers, and riders to interact seamlessly. Your contributions will directly impact the efficiency and sustainability of urban transportation.
• You will leverage your expertise in containerization technologies like Docker and orchestration platforms such as Kubernetes to manage and scale applications effectively.
• Your work will involve extensive use of cloud services, particularly within the AWS ecosystem, to build and maintain robust infrastructure.
• You will also be involved in managing and optimizing database systems, ensuring data integrity and availability.
• The role encourages a proactive and ownership-driven mindset, where you take responsibility for the systems you help build and operate, ensuring their long-term success and reliability.
• You will be part of a culture that values trust, respect, and open dialogue, with opportunities for creative freedom and mentorship to foster your professional growth.

Skills & Technologies

Python

Java

Ruby

Spring

Rails

DevOps

Senior

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Exadel Inc.

Visit Website

About Exadel Inc.

Exadel is a U.S.-based global software engineering company founded in 1998. It provides digital transformation and custom application development services to enterprises, leveraging cloud, AI, and modern architectures. The company offers product strategy, UX/UI design, full-stack development, QA, and managed support across industries including financial services, healthcare, retail, and technology. Headquartered in Walnut Creek, California, Exadel maintains delivery centers in Eastern Europe and Asia, combining nearshore agility with enterprise-grade processes. Clients range from Fortune 500 firms to growth-stage companies seeking scalable, secure software solutions.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.