SimSpace Corporation logo

Staff Site Reliability Engineer

Job Overview

Location

Remote - U.S.

Job Type

Full-time

Category

Software Engineering

Date Posted

May 22, 2026

Full Job Description

đź“‹ Description

  • • Architect and define the overarching infrastructure strategy for SimSpace’s cyber range platform, enabling consistent, secure, and repeatable deployments across SimSpace-hosted data centers, customer-provided hardware, and air-gapped environments.
  • • Lead the evolution of CI/CD and Kubernetes platforms by designing multi-cluster, multi-environment deployment frameworks using Jsonnet, Grafana Tanka, and Kustomize to improve developer velocity and reduce operational toil.
  • • Define, measure, and govern Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets across the engineering organization, partnering with product and engineering leadership to balance feature delivery with platform stability.
  • • Architect the enterprise observability strategy using the Grafana stack, designing frameworks for proactive monitoring, complex anomaly detection, and distributed tracing to provide deep visibility into system health, pod scaling, and latency bottlenecks.
  • • Drive infrastructure security architecture by embedding container hardening, zero-trust network segmentation, and automated compliance policies into deployment pipelines and runtime environments for regulated and customer-managed systems.
  • • Serve as a strategic technical partner to development teams, advocating for an SRE culture through self-service tooling, paved roads for developers, and standardized operational practices across the engineering organization.
  • • Act as Incident Commander during high-severity outages, leading blameless post-mortems and implementing systemic, architectural fixes to prevent recurrence of failure classes.
  • • Mentor senior and mid-level engineers by coaching, documenting best practices, and raising the baseline of engineering excellence through technical leadership and example.
  • • Design and implement GitOps workflows using GitHub Actions and ArgoCD, applying infrastructure-as-code principles at an enterprise scale to manage complex, distributed systems.
  • • Build and maintain automation frameworks that support diverse deployment models including on-premises, VMware-based, and pre-packaged hardware-software appliances.
  • • Collaborate across departments and time zones to ensure platform reliability, security, and scalability for global clients including allied governments, militaries, and enterprises.
  • • Contribute to the continuous learning culture by participating in internal and external training, cyber conferences, and industry events to stay at the forefront of SRE and DevSecOps practices.
  • • Influence cross-functional leadership to align engineering teams behind a unified technical vision, negotiating reliability tradeoffs and championing long-term infrastructure investments.
  • • Ensure compliance with federal employment eligibility requirements and maintain a secure, inclusive, and welcoming environment for all team members.

🎯 Requirements

  • • 8+ years of experience in Site Reliability, Platform, or DevOps engineering at a Staff, Principal, or Lead level
  • • Deep software engineering skills beyond scripting, with proficiency in at least one modern language (e.g., Go, Python)
  • • Expert-level knowledge of Kubernetes in multi-tenant, multi-cluster production environments and advanced configuration management using Jsonnet and Grafana Tanka
  • • Extensive experience architecting enterprise-scale CI/CD pipelines and GitOps workflows using GitHub Actions and ArgoCD
  • • Systems-level expertise designing deployments across self-hosted, on-premises, VMware, and air-gapped environments
  • • Deep expertise with the Grafana stack for observability, alerting, and monitoring of distributed systems

🏖️ Benefits

  • • Base salary range of $165,000 - $230,000 with performance-based bonus opportunities
  • • Comprehensive medical, dental, and vision benefits starting on day one
  • • Unlimited vacation and dedicated health & wellness days
  • • 401(k) retirement savings plan with company match
  • • Equity stock options at hire and annual performance-based grants
  • • Access to LinkedIn Learning, Spring Health counseling, Peloton wellness program, and monthly SocialSpace community reimbursements

Skills & Technologies

Python
Spring
Kubernetes
GitHub
Grafana
Senior
Remote
$165k-230k
Degree Required

Ready to Apply?

You will be redirected to an external site to apply.

AI Job Fit Analysis
Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

SimSpace Corporation logo
SimSpace Corporation
Visit Website

About SimSpace Corporation

SimSpace Corporation provides cyber range software that lets enterprises model, test and train on replica networks. Its platform continuously mirrors production environments, offering on-demand labs for red/blue teams, compliance validation and workforce development. Fortune 500, defense and MSSP customers use the solution to reduce security risk, accelerate incident response and measure cyber readiness. Founded in 2015 and headquartered in Boston, the company is privately held and backed by venture investors focused on critical infrastructure protection.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

Expired
United Kingdom
Contract
Expired May 31, 2026
Python
Remote
Degree Required

3 months ago

Sofia, Bulgaria
Full-time
Expires Aug 2, 2026
Go
PostgreSQL
DynamoDB
+4 more

16 days ago

Expired
Poland
Full-time
Expired Dec 14, 2025
Python
PostgreSQL
AWS
+4 more

8 months ago

Expired
New York City
Full-time
Expired Apr 27, 2026
Onsite
$120k-130k

4 months ago