This job has expired

This position was posted on January 8, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Senior Site Reliability Engineer, Core Cloud Engineering

The Constant Company, LLC

Job Overview

Location

Remote

Job Type

Full-time

Full Job Description

📋 Description

• Own and evolve the global control plane that orchestrates every VM, GPU, and bare-metal instance across Vultr’s 32 data centers—directly impacting the experience of 1.5 million users and the stability of a $3.5 B cloud platform.
• Design, deploy, and harden hypervisor fleets at scale: automate the life-cycle of KVM/QEMU/libvirt hosts, enforce golden-image standards, and eliminate single points of failure before they surface to customers.
• Build self-healing network automation for Open vSwitch fabrics and BGP-driven edge routing; reduce MTTR from hours to minutes by codifying runbooks into deterministic, testable pipelines.
• Instrument deep observability into every layer—compute, storage, network—using Grafana, Sentry, and SumoLogic; define SLIs/SLOs that translate business goals into measurable reliability targets.
• Lead production incident response: coordinate cross-functional war rooms, drive blameless postmortems, and turn outages into systemic improvements that prevent recurrence.
• Continuously profile and tune performance: from NUMA topology on bare metal to NUMA-aware CPU pinning in guests, squeeze every microsecond of latency and every watt of efficiency.
• Evolve CI/CD and configuration management (GitLab CI, Puppet) to deliver infrastructure changes dozens of times per day with zero-downtime guarantees and automated rollback.
• Partner with software, network, and product teams to translate customer-facing features into rock-solid infrastructure requirements, ensuring new services launch with reliability baked in.
• Create living documentation—architecture decision records, runbooks, and automation playbooks—that empowers the entire engineering org to operate at your level.
• Mentor junior SREs and systems engineers through pair debugging, design reviews, and weekly tech talks, raising the reliability bar across the company.
• Champion a culture of toil reduction: every manual step is a candidate for code, every alert is a candidate for automation, and every metric is a candidate for an SLO.
• Influence long-term platform strategy: capacity forecasting, multi-region disaster recovery, and chaos engineering practices that keep Vultr ahead of customer growth.
• Stay hands-on with low-level Linux internals—cgroups, eBPF, kernel tracing—and turn arcane system behavior into deterministic, reproducible fixes.
• Contribute upstream to open-source projects (libvirt, QEMU, OVS) and bring back patches that improve Vultr’s operational posture and the broader community.
• Participate in an on-call rotation that respects work-life balance: follow the sun with global teammates, automate yourself out of 3 a.m. pages, and use every incident as a forcing function for better tooling.

🎯 Requirements

• 5+ years running large-scale distributed systems and control-plane infrastructure in production, preferably in a cloud or hosting environment.
• Expert-level proficiency with KVM, QEMU, and libvirt; you can debug live-migration failures and NUMA mis-alignments without googling man pages.
• Strong networking chops: hands-on experience with BGP, Open vSwitch, and Linux traffic control; you’ve automated network changes via Ansible or custom Go/Python tooling.
• Deep observability mindset: you’ve implemented SLIs/SLOs, built Grafana dashboards, and tuned alert fatigue out of existence with Sentry and SumoLogic.
• Proficiency in PHP (or similar) for scripting and automation; you treat infrastructure as code and code as infrastructure.

🏖️ Benefits

• 100 % company-paid medical, dental, and vision premiums for employee-only plans—zero payroll deductions for core coverage.
• 401(k) with 100 % match up to 4 % of salary, vested immediately—start building retirement wealth from day one.
• $2,500 annual professional-development stipend—conferences, courses, certs, or that new homelab you’ve been eyeing.
• Remote-first culture with $500 first-year home-office setup stipend and $400 annual refresh—plus up to $75/month internet reimbursement.

Skills & Technologies

PHP

MySQL

GitLab

Linux

Grafana

Senior

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

The Constant Company, LLC

Visit Website

About The Constant Company, LLC

The Constant Company, LLC operates the Vultr cloud infrastructure brand, providing on-demand compute, storage, bare-metal, and managed Kubernetes services from 32 global data centers. Founded in 2014, the company targets developers, SaaS businesses, and enterprises with hourly billing, API-driven provisioning, and standardized hardware. Services include virtual machines, block storage, load balancers, object storage, managed databases, and cloud GPUs, all accessible through a unified control panel and REST API. Vultr emphasizes price-performance, global reach, and rapid deployment for web applications, CI/CD workflows, and edge workloads without long-term contracts.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.