Raintank Inc. logo

Senior Software Engineer - Grafana Databases, SRE | Spain | Remote

Job Overview

Location

Spain (Remote)

Job Type

Full-time

Category

Software Engineer

Date Posted

February 24, 2026

Full Job Description

đź“‹ Description

  • • Join Grafana Labs, a globally recognized open-source leader, as a Senior Software Engineer specializing in Site Reliability Engineering (SRE) for our Grafana Cloud databases. With over 20 million users worldwide, Grafana is the go-to visualization tool for monitoring diverse systems, from critical infrastructure to environmental data. Our company empowers over 3,000 organizations, including industry giants like Bloomberg and JPMorgan Chase, to manage their observability strategies through the Grafana LGTM Stack. This powerful suite includes scalable metrics (Grafana Mimir), logs (Grafana Loki), and traces (Grafana Tempo), offered as a fully managed SaaS solution via Grafana Cloud or a self-managed Grafana Enterprise Stack.
  • • We are a remote-first company experiencing rapid growth, driven by our commitment to an open-source ethos, a collaborative global culture, and a passion for impactful work. Our team thrives in an environment that fosters innovation, transparency, autonomy, and trust, enabling us to achieve remarkable results.
  • • This is a remote opportunity, and we are particularly interested in candidates based in Spain, Germany, the UK, or Sweden. If this role ignites your passion and you believe you can make a significant contribution, we strongly encourage you to apply, even if you don't meet every single requirement. This could be a truly career-defining opportunity.
  • • As a Senior Software Engineer - SRE, you will play a pivotal role in supporting our most valued Grafana Cloud customers by enhancing the reliability of our cloud-based databases, which are built upon Mimir, Loki, Tempo, and Pyroscope. These databases are delivered as a Software-as-a-Service (SaaS) product across AWS, GCP, and Azure in all regions.
  • • The SRE team operates in an embedded model within the Mimir and Loki squads, focusing intently on ensuring that Grafana Cloud’s database offerings consistently deliver exceptional reliability for our highest-SLA customers. We are seeking a senior engineer who excels at the intersection of customer needs, production systems, and product engineering.
  • • Your responsibilities will include:
  • • Partnering closely with product engineering squads in an embedded capacity to foster seamless collaboration and integration.
  • • Taking ownership of production reliability for complex customer environments that demand the highest Service Level Agreements (SLAs).
  • • Designing, implementing, and scaling automation initiatives to enhance our reliability practices and operational efficiency.
  • • Proactively ensuring that our customers consistently meet and exceed their Service Level Objective (SLO) targets.
  • • Defining, evolving, and managing per-tenant SLOs and sophisticated reliability models tailored to individual customer needs.
  • • Strategically reducing SLO budget burn by identifying and implementing preventative measures, thereby minimizing the risk of repeat incidents.
  • • Serving as a primary escalation point and participating in the on-call rotation for critical incidents, ensuring rapid and effective response.
  • • Leading customer-impacting incident response efforts, conducting thorough post-incident reviews, and driving continuous improvement.
  • • Contributing to the development of design documents, participating actively in code reviews, and upholding high standards of code quality.
  • • Influencing feature design and product roadmaps to ensure inherent production scalability, operability, and resilience.
  • • Building robust automation solutions to eliminate toil and repetitive tasks, freeing up valuable engineering time.
  • • Improving alert quality, reducing false positives, and minimizing noisy escalations to enhance system monitoring and response effectiveness.
  • • We understand the importance of a healthy work-life balance, especially concerning on-call responsibilities. As a global, remote-first company, we structure our on-call rotations to align with approximately 12 daylight hours per day. You will collaborate closely with international counterparts to ensure balanced coverage and shared ownership.
  • • We are committed to empowering our engineers with cutting-edge tools. You will have the opportunity to leverage modern AI coding assistants in your daily workflow, with a company-funded budget for your preferred tools (within security guidelines). This support enables rapid iteration and reduces friction in development. We champion pragmatic AI-assisted development, focusing on faster prototyping, automated test generation, code refactoring, documentation enhancement, and streamlined incident follow-ups, all while maintaining rigorous code review and quality standards. Access to frontier models like GPT-Codex 5/3, Claude Opus 4.6, and Gemini 3 Pro will be available.
  • • Your role will also involve:
  • • Regularly engaging in 1:1 meetings with your manager and colleagues to foster professional development and team cohesion.
  • • Proactively reviewing and creating SLOs, investigating opportunities to reduce budget burn through improvements in monitoring, automation, self-healing capabilities, and auto-scaling.
  • • Enhancing the observability of customer environments to provide deeper insights and faster troubleshooting.
  • • Designing and implementing solutions that guarantee the reliability and scalability of our environments to meet escalating demands.
  • • Developing fault-tolerant design patterns, ensuring reliability is a core consideration throughout the entire service lifecycle.
  • • Collaborating with Engineering Leaders to help shape product strategy, roadmaps, and technical designs.
  • • Participating in code reviews and collaborating on design documents with fellow engineers.
  • • Educating team members on Site Reliability Engineering principles and promoting best practices early in the development of new features.
  • • Actively participating in incident response, from investigation and resolution to post-incident reviews and customer communication via bridge calls when necessary.

Skills & Technologies

Python
Java
AWS
Azure
GCP
DevOps
Senior
Remote

Ready to Apply?

You will be redirected to an external site to apply.

Raintank Inc. logo
Raintank Inc.
Visit Website

About Raintank Inc.

Raintank Inc., operating as Grafana Labs, is the open-source company behind the Grafana observability platform. It develops and maintains Grafana dashboards, Loki for logs, Tempo for traces, Mimir for metrics, and Grafana Cloud services, providing scalable monitoring and analytics for DevOps, SRE, and engineering teams worldwide. Grafana Labs supports on-prem and SaaS deployments with enterprise-grade features and commercial support.

Similar Opportunities

❌ EXPIRED
Rio de Janeiro
Full-time
Expired Feb 24, 2026
JavaScript
TypeScript
Angular
+4 more

2 months ago

Apply
Remote - India
Full-time
Expires Apr 25, 2026
Python
Java
Scala
+3 more

9 days ago

Apply
❌ EXPIRED
Remote
Full-time
Expired Nov 18, 2025
Go
Senior
Remote

6 months ago

Apply
⏰ EXPIRES SOON
Grant Street Group logo

Grant Street Group

United States (Remote)
Full-time
Expires Mar 10, 2026 (Soon)
Python
JavaScript
Java
+4 more

2 months ago

Apply