Infiterra Inc. logo

Site Reliability Engineer (SaaS)

Job Overview

Location

Remote

Job Type

Full-time

Category

Software Engineering

Date Posted

February 17, 2026

Full Job Description

đź“‹ Description

  • • Infiterra Inc. is at the forefront of transforming the subscription economy, empowering IT Distributors and Managed Service Providers (MSPs) with a robust B2B SaaS platform designed to automate and accelerate their subscription business growth. With a significant global presence, serving over 100 customers across 75 countries, our commitment to innovation and impact is unwavering. As we continue our trajectory of global scaling, the paramount importance of platform reliability and stability cannot be overstated. We are actively engaged in an architectural evolution, moving towards a more Azure-native ecosystem, expanding our Kubernetes (AKS) footprint, and significantly enhancing our operational maturity. The year 2026 is earmarked as a pivotal period for elevating our standards in uptime, observability, and incident management. To spearhead this critical initiative, we are establishing a dedicated Site Reliability Engineering (SRE) team, integrated within our Platform Infrastructure group. This team will assume focused ownership of our platform's uptime, resilience, and overall production excellence.
  • • As a Site Reliability Engineer at Infiterra, you will be instrumental in upholding and advancing the reliability and stability of our Azure-based SaaS platform, ensuring seamless service delivery to our global clientele. This role demands hands-on engagement with our AKS clusters and underlying infrastructure, a deep commitment to strengthening our monitoring and observability capabilities, and the proactive development of processes designed to preempt and prevent production incidents. This is not a role confined to pipeline-only DevOps, basic helpdesk functions, or purely networking tasks; it is a unique opportunity to exert a tangible and significant impact on our live production systems and to occupy a central position in maintaining our platform at its peak performance. We are actively seeking individuals who possess proven experience in live production environments, have navigated and resolved real-world incidents, and deeply understand the accountability and ownership inherent in maintaining high levels of uptime and reliability.
  • • **Reliability & Operations:** Your core responsibilities will include the meticulous maintenance and continuous enhancement of production uptime, directly contributing to our ambitious 99.9% uptime target for 2026. You will be tasked with proactively monitoring our systems, ensuring swift and effective responses to production incidents, and driving tangible improvements in our Mean Time to Resolution (MTTR). A critical aspect of this role involves performing structured root cause analyses (RCAs) following incidents and actively contributing to the implementation of long-term preventive actions. As we mature our production support model, you will participate in an evolving on-call rotation, ensuring continuous coverage and rapid response.
  • • **Cloud & Infrastructure Management:** You will be responsible for managing and optimizing our Azure infrastructure, encompassing compute, networking, and identity components. This involves hands-on work with Azure Kubernetes Service (AKS) clusters, a key element of our expanding Kubernetes adoption. You will maintain crucial networking components, including load balancers and private endpoints, and contribute directly to enhancing the platform's overall resilience and scalability in response to growing user demand and data volume.
  • • **Observability & Automation:** A significant part of your role will be designing and refining our observability practices. This includes establishing and improving standards for metrics collection, log aggregation, and alerting across all production systems. You will contribute to and enhance our Infrastructure as Code (IaC) practices, utilizing tools like Terraform, to ensure consistent, repeatable, and reliable deployments. A key objective is to reduce manual operational effort through the development and implementation of automation scripts and tools, freeing up valuable engineering time and minimizing human error.
  • • **Collaboration & Support:** You will work in close collaboration with our DevOps teams to ensure seamless CI/CD integration and the reliability of production deployments. This includes supporting security initiatives focused on infrastructure hardening and partnering with DevOps on deployment reliability and managing configuration changes that have an impact on production environments. Your ability to communicate effectively and work cross-functionally will be vital to the success of these efforts.
  • • This role offers a unique chance to shape the future of a rapidly growing SaaS platform, directly influencing its stability, performance, and scalability. If you are passionate about reliability, thrive in a dynamic environment, and are eager to make a substantial impact, we encourage you to apply.

Skills & Technologies

Python
Azure
Kubernetes
Terraform
Linux
Remote

Ready to Apply?

You will be redirected to an external site to apply.

Infiterra Inc. logo
Infiterra Inc.
Visit Website

About Infiterra Inc.

Infiterra Inc. is a technology company focused on developing and deploying advanced solutions for the energy sector. Their core offerings revolve around leveraging artificial intelligence and machine learning to optimize oil and gas exploration, production, and reservoir management. By analyzing vast datasets, Infiterra aims to enhance recovery rates, reduce operational costs, and improve safety and environmental performance for their clients. The company provides a suite of software tools and consulting services designed to integrate seamlessly with existing energy infrastructure. Their innovative approach helps energy companies make more informed decisions, mitigate risks, and maximize the value of their subsurface assets in an increasingly complex global market.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

SHI International Corp. logo

SHI International Corp.

US - Remote
Full-time
Expires Apr 29, 2026
AWS
Azure
Remote
+2 more

27 days ago

Apply
Remote
Full-time
Expires Apr 13, 2026
Python
JavaScript
AWS
+3 more

1 month ago

Apply
❌ EXPIRED
Aquia Inc. logo

Aquia Inc.

Remote
Full-time
Expired Nov 24, 2025
Python
JavaScript
GitHub
+3 more

6 months ago

Apply
Remote
Full-time
Expires Apr 13, 2026
Remote

1 month ago

Apply