
Job Overview
Location
US Remote
Job Type
Full-time
Category
DevOps
Date Posted
May 16, 2026
Full Job Description
đź“‹ Description
- • Own and evolve the AWS infrastructure powering NetBox Cloud, including EKS clusters, VPC design, IAM policies, RDS databases, and supporting services.
- • Lead end-to-end infrastructure projects from design and scoping to delivery, breaking work into clear milestones and delegating tasks across the engineering team.
- • Collaborate with Product teams to enable reliable, scalable customer launches on the NetBox Cloud platform.
- • Improve deployment automation and CI/CD pipelines using GitHub Actions, with a focus on reliability, speed, AI enablement, and developer self-service.
- • Proactively identify and eliminate technical debt before it impacts system reliability or security.
- • Define, own, and monitor SLOs for critical platform systems, reducing manual toil through automation.
- • Drive and maintain SOC 2 compliance initiatives, implement security controls, and enforce IAM governance across the infrastructure.
- • Mentor junior and mid-level engineers through code reviews, pairing sessions, and architectural design discussions.
- • Participate in a 24/7 on-call rotation and lead incident response from detection to resolution, including root cause analysis and postmortem documentation.
- • Design and implement observability systems using Grafana, Prometheus, Loki, and OpenTelemetry to ensure platform visibility and performance.
- • Build and maintain infrastructure as code using Terraform and Helm to ensure consistency, repeatability, and auditability across environments.
- • Write and review detailed technical design documents, and actively participate in architecture reviews with cross-functional teams.
- • Leverage AI tools such as GitHub Copilot, Cursor, and Claude as part of the daily engineering workflow to enhance productivity and code quality.
- • Automate provisioning and configuration of multi-tenant EKS clusters serving thousands of customer instances while ensuring tenant isolation and security.
- • Optimize cloud infrastructure costs through usage analysis, resource right-sizing, and automation of idle resource cleanup.
- • Ensure high availability and fault tolerance of the platform by designing resilient architectures across multiple Availability Zones and regions.
- • Maintain rigorous security posture through continuous vulnerability scanning, patch management, and access control enforcement.
- • Partner with security and compliance teams to audit and update policies for IAM, network access, and data protection.
- • Advocate for platform reliability by championing best practices in monitoring, alerting, and automated recovery workflows.
- • Foster a culture of shared ownership and continuous improvement by encouraging knowledge sharing and documentation across the team.
- • Balance innovation with operational stability, ensuring new features and systems are deployed without compromising platform uptime or security.
Skills & Technologies
Python
AWS
Kubernetes
Terraform
GitHub
DevOps
Senior
Remote
About NetBox Labs Inc.
NetBox Labs Inc. provides open-source network automation and infrastructure management software built around the NetBox ecosystem. Founded by the creators of NetBox, the company offers NetBox Cloud, professional support, training, and enterprise plugins to help organizations model, document, and automate networks at scale while maintaining data integrity and interoperability with existing DevOps and network engineering workflows.
Get more remote jobs like this
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities

Web.com Group, Inc.
Argentina - Remote
Full-time
Expires Jul 14, 2026
Python
Docker
Kubernetes
+4 more
21 hours ago

