This job has expired

This position was posted on March 5, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Senior Site Reliability Engineer (Ruby+ DevOps)

Internal Referral Program

Job Overview

Location

Bulgaria, Georgia, Lithuania, Poland, Romania, Uzbekistan

Job Type

Full-time

Full Job Description

📋 Description

• As a Senior Site Reliability Engineer at Exadel, you will be instrumental in designing, building, and operating highly reliable, scalable, and resilient distributed systems that power critical client applications. You will play a pivotal role in enhancing system availability, optimizing performance, and fortifying resilience against failures. Your expertise will be crucial in automating infrastructure provisioning, streamlining deployment processes, and refining operational workflows to minimize manual intervention and reduce operational toil.
• You will be responsible for diagnosing and resolving complex production issues, ensuring minimal disruption to end-users and business operations. A key aspect of this role involves leading system upgrades and migrations, meticulously planning and executing them to achieve minimal or zero downtime, thereby maintaining continuous service delivery.
• This position requires active participation in on-call rotations and incident response, where you will be expected to act calmly and effectively under pressure, demonstrating a strong ownership mindset to drive swift and successful resolutions. You will collaborate closely with development teams, providing essential SRE insights to enhance the operability of their applications from the ground up.
• A significant part of your contribution will be driving best practices in monitoring, alerting, and capacity planning. This includes implementing robust monitoring solutions, defining meaningful alerts, and proactively planning for future capacity needs to ensure systems can handle anticipated growth and load.
• You will be a champion for reducing operational toil through innovative automation strategies, freeing up valuable engineering time for more strategic initiatives. Your involvement in incident management will extend to conducting thorough post-mortems, developing comprehensive disaster recovery strategies, and continuously identifying opportunities for reliability improvements across the platform.
• The role demands a proactive approach to system design, focusing on high availability and disaster recovery from the outset. You will leverage your deep understanding of cloud-native technologies, particularly Kubernetes and AWS, to build and maintain robust infrastructure.
• You will contribute to the evolution of our CI/CD pipelines and champion Infrastructure as Code (IaC) principles, ensuring that infrastructure is managed efficiently, consistently, and reproducibly.
• Your responsibilities will also include working with messaging systems and databases, ensuring their reliability, performance, and scalability within the distributed architecture.
• You will be a key player in fostering a culture of reliability and operational excellence, sharing your knowledge and mentoring other engineers. Your ability to communicate complex technical concepts clearly and collaborate effectively across diverse engineering teams will be essential for success.
• This role offers the opportunity to work on challenging, large-scale projects for Fortune 500 clients, contributing to digital transformation and AI-first initiatives. You will be part of a global tech company with a strong engineering legacy, working with cutting-edge technologies and a talented, ambitious team.
• The client you will be supporting is developing a sophisticated mobility platform designed to revolutionize transit. This platform aims to empower operators with efficient vehicle and driver management, inform regulators with necessary data, enable service providers to offer sustainable solutions, and deliver an effortless transit experience for riders. Your work will directly impact the reliability and scalability of this innovative solution.
• You will be expected to stay abreast of the latest trends and technologies in SRE and DevOps, continuously learning and applying new knowledge to improve our systems and processes. This includes exploring and implementing advanced practices like chaos engineering and cost optimization in cloud environments.
• The role emphasizes a collaborative and supportive work environment where your contributions are valued, and you have the opportunity to grow your career within a dynamic and forward-thinking organization. You will be empowered to make a real difference and shape the future of our clients' technology solutions.

Skills & Technologies

Python

Java

Ruby

Spring

Rails

DevOps

Senior

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

Internal Referral Program

Visit Website

About Internal Referral Program

Internal Referral Program provides cloud-based employee-referral recruiting software for mid-size and large enterprises. The platform automates job posting to staff, tracks candidate progress, manages bonuses and compliance, and delivers analytics dashboards to HR teams. Founded in 2015 and headquartered in San Francisco, the company integrates with major applicant-tracking systems, Slack, and Microsoft Teams to increase referral participation and reduce time-to-hire. Clients include technology, finance, and healthcare organizations seeking to lower recruiting costs while improving hire quality through internal networks.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.