
Job Overview
Location
Remote
Job Type
Full-time
Category
Engineering Manager
Date Posted
January 19, 2026
Full Job Description
đź“‹ Description
- • Lead and grow a high-impact Infrastructure team of Staff and Senior Engineers who are building the backbone that powers synthetic monitoring for 1,000+ engineering teams at LinkedIn, Citibank, Vercel, Render and beyond. You will be both coach and player—pair-programming, reviewing architecture, and diving into packet captures when the stakes are high.
- • Own the roadmap for our hybrid AWS + bare-metal platform, balancing security, cost-efficiency, and performance so that every customer check runs in a fully-sandboxed environment in milliseconds, not minutes. Your decisions directly affect how quickly developers can detect and fix issues before users ever notice.
- • Champion reliability and observability: design SLIs/SLOs, automate chaos drills, and ensure our Postgres, ClickHouse, and Kubernetes clusters stay snappy under millions of synthetic checks per day. When alerts fire, you and the team are the first responders, turning incidents into post-mortems that make the system antifragile.
- • Optimize for cost at scale. We process petabytes of run-time data and AI-driven traces; shaving a few cents per check compounds into six-figure savings. You will negotiate reserved instances, tune eBPF networking, and experiment with spot fleets and bare-metal GPU nodes for AI Agents.
- • Elevate developer experience across the company. You will ship Terraform modules, Ansible playbooks, and internal CLIs that let product engineers deploy to prod dozens of times a day without fear. Your team’s tooling is the secret sauce behind our “ship all day, every day” culture.
- • Collaborate cross-functionally with Product, Security, and Customer Success to translate customer pain into infrastructure features—whether that means adding a new region in 30 minutes or surfacing real-time Playwright traces inside our Vue.js dashboard.
- • Foster an async-first, documentation-driven culture. We run lightweight Agile (not Scrum), record Loom demos instead of status meetings, and default to RFCs in Notion. You will mentor engineers to write crisp ADRs and runbooks so knowledge scales faster than headcount.
- • Contribute hands-on when needed: debug kernel panics on bare-metal boxes, profile Go services, or craft eBPF programs to trace syscalls inside our sandbox. Your technical credibility earns trust and keeps the team sharp.
- • Shape hiring and onboarding. You will define rubrics, run interviews, and design a 30-60-90 plan that turns new hires into autonomous owners within one quarter. Diversity and inclusion are not buzzwords here—you will actively recruit candidates from under-represented groups.
- • Report to the VP of Engineering and present monthly metrics on uptime, cost-per-check, and team health to the entire company. Transparency is a core value; our salary calculator and open handbooks are public because we believe information asymmetry kills trust.
🎯 Requirements
- • 5+ years building and maintaining production-grade bare-metal and cloud infrastructure (AWS required, hybrid setups strongly preferred)
- • Deep Linux administration skills—comfortable with systemd, cgroups, eBPF, and debugging at the syscall level
- • Proven experience managing infrastructure as code with Terraform and Ansible (or equivalent) in a continuous-delivery environment
- • Hands-on Kubernetes knowledge: cluster bootstrapping, multi-tenancy, network policies, and cost-aware autoscaling
- • Excellent written and spoken English; able to drive async decisions via RFCs and run incident post-mortems
- • Fully remote within UTC-3 to UTC+3; proven ability to thrive in a low-meeting, high-autonomy environment
🏖️ Benefits
- • Transparent, location-based salary: €99k–€121k (UK/Germany band) or €89k–€109k (Spain/Poland/Ukraine band) with no negotiation games
- • 27 days paid vacation plus local public holidays, 14 weeks paid parental leave, and paid sick leave
- • €1,500 annual learning & wellbeing budget, co-working stipend or home-office setup, and bi-annual company retreats
- • Stock options in a fast-growing Series B startup backed by Balderton, CRV, and Accel
Skills & Technologies
JavaScript
TypeScript
Go
Vue.js
PostgreSQL
DevOps
Remote
About Checkly Group Inc.
Checkly offers an application reliability platform that unifies testing, monitoring, and observability into a developer-friendly workflow. They provide uptime and end-to-end monitoring, empowering engineering teams to detect, communicate, and resolve performance issues. Using a Monitoring as Code approach, Checkly allows users to automate their entire monitoring process with tools like Playwright and OpenTelemetry, integrating seamlessly into CI/CD workflows. World-class engineering and SRE teams depend on Checkly to deliver reliable digital experiences, and they provide integrations for Slack, SMS, and more to alert teams when issues arise.



