Astronomer, Inc. logo

Senior Customer Reliability Engineer - Infrastructure

Job Overview

Location

Ireland

Job Type

Full-time

Category

DevOps & SysAdmin

Date Posted

March 21, 2026

Full Job Description

đź“‹ Description

  • • As a Senior Customer Reliability Engineer - Infrastructure at Astronomer, you will play a critical role in ensuring the stability, performance, and reliability of the company’s managed Airflow platform, Astro, which powers mission-critical data workflows for over 800 global enterprises. Your work directly impacts customer success by maintaining high availability and predictability of the platform, enabling data teams to deliver analytics and AI-driven insights without disruption.
  • • You will operate at the intersection of infrastructure engineering and customer success, troubleshooting complex issues in multi-cloud Kubernetes environments, driving incident resolution, and improving system observability to prevent future failures. This role offers deep exposure to real-world data platform challenges across industries, allowing you to shape both the reliability of the product and the customer experience through proactive problem-solving and collaboration with product teams.
  • • The Customer Reliability Engineering (CRE) team at Astronomer is a customer-obsessed, globally distributed group responsible for the operational health of the Astro platform. Working closely with product, support, and engineering teams, CRE acts as the trusted advisor to customers, ensuring their data pipelines run smoothly and efficiently while advocating for their needs in product development.
  • • In this role, you will develop deep expertise in cloud-native infrastructure, Kubernetes operations, and distributed systems at scale, while honing your customer-facing technical communication and incident management skills. You’ll have the opportunity to influence architectural decisions, build automation and monitoring systems, and contribute to documentation that improves the experience for hundreds of data engineers and scientists worldwide.
  • • Provide solutions to customers to make them successful using our products.
  • • Troubleshoot customer environments and engage in active triaging with customers to diagnose and resolve platform issues quickly and effectively.
  • • Participate in on-call rotation for weekend coverage, ensuring 24/7 platform reliability and rapid incident response.
  • • Provide feedback to the product development teams on customer needs and pain points, influencing product roadmap and feature prioritization.
  • • Build out our monitoring and alerting systems to improve observability and reduce mean time to detection (MTTD) and resolution (MTTR).
  • • Build and maintain automation to ensure daily operational tasks are handled efficiently, reducing toil and increasing system stability.
  • • Help direct the architecture of the products and contribute where possible, leveraging your infrastructure expertise to improve scalability and resilience.
  • • Own the customer experience, working directly with customers to prioritize and solve issues, meet SLAs, and provide “white glove” guidance on the path to production.
  • • Participate remotely within a fully distributed team, collaborating across time zones with engineers, support, and product specialists.
  • • Enhance and enrich customer documentation to improve self-service capabilities and reduce support friction.
  • • Work with the latest technology and multi-cloud implementations, including AWS, GCP, and Azure, staying at the forefront of cloud infrastructure innovation.

🎯 Requirements

  • • 6 years of experience with large, complex cloud infrastructures operating at scale
  • • 4 years of hands-on experience managing and operating Kubernetes in production
  • • Experience managing a production distributed system with at least one major cloud provider (AWS, GCP, or Azure)
  • • Strong Linux system administration and troubleshooting skills
  • • Proven ability to handle customer issues, either internally or externally, with strong communication and empathy
  • • Proficiency in Python scripting and DevOps/CI/CD practices

🏖️ Benefits

  • • Opportunity to work with cutting-edge cloud and Kubernetes technologies in a fast-growing, innovative company
  • • Direct impact on customer success and product reliability for a platform used by Fortune 500 enterprises
  • • Fully remote work environment with flexibility to collaborate across a global, distributed team
  • • Exposure to multi-cloud environments (AWS, GCP, Azure) and modern DevOps practices
  • • Professional growth in site reliability engineering, infrastructure automation, and customer-facing technical roles
  • • Contribution to open-source-adjacent projects and internal tooling that improves platform observability and automation

Skills & Technologies

Python
AWS
Azure
GCP
Kubernetes
DevOps
Senior
Remote

Ready to Apply?

You will be redirected to an external site to apply.

Astronomer, Inc. logo
Astronomer, Inc.
Visit Website

About Astronomer, Inc.

Astronomer, Inc. provides Apache Airflow as a managed cloud service and enterprise platform. The company maintains the open-source workflow orchestration project, offers commercial support, and delivers a control plane that lets data teams deploy, monitor, and scale directed acyclic graphs across Kubernetes clusters. Its product suite includes Astro, a fully hosted Airflow environment with role-based access, CI/CD hooks, and usage observability. Customers use the software to schedule Python and SQL data pipelines, connect on-premise and cloud databases, and ensure reliable data delivery for analytics and machine-learning workloads. Astronomer is headquartered in Cincinnati, Ohio, and serves global enterprises.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

Yerevan, Armenia
Full-time
Expires Jun 4, 2026
Python
Java
Go
+6 more

1 month ago

Apply
Pragmatike Soluciones TecnolĂłgicas S.L. logo

Pragmatike Soluciones TecnolĂłgicas S.L.

Armenia
Full-time
Expires Jun 6, 2026
JavaScript
TypeScript
Rust
+4 more

1 month ago

Apply
Yerevan, Armenia
Full-time
Expires Jun 4, 2026
Python
Java
Go
+5 more

1 month ago

Apply
Argentina
Full-time
Expires May 31, 2026
Azure
Remote
$40k-45k

1 month ago

Apply