Lambda Inc. logo

Senior Site Reliability Engineer - Observability

Job Overview

Location

San Francisco Office (Fremont St)

Job Type

Full-time

Category

DevOps & SysAdmin

Date Posted

May 10, 2026

Full Job Description

đź“‹ Description

  • • Senior Site Reliability Engineer - Observability role at Lambda Inc., a leader in AI cloud infrastructure serving tens of thousands of customers, focused on making compute as ubiquitous as electricity and delivering superintelligence power to everyone.
  • • Deploy and operate observability platforms for logging, metrics, and distributed tracing; automate deployment and operation of these systems; set up monitoring for modern AI/HPC cluster infrastructure; develop platform software to improve observability adoption and product reliability; lead cross-team engineering efforts to solve monitoring challenges.
  • • Engineering team at Lambda builds and scales cloud offerings including website, cloud APIs, systems, and internal tooling for deployment, management, and maintenance; operates with a hybrid model requiring 4 days/week in SF, San Jose, or Bellevue offices with Tuesday as remote day.
  • • Opportunity to deepen expertise in observability, SRE practices, Kubernetes, Go, and modern DevOps while influencing reliability across AI infrastructure used by researchers, enterprises, and hyperscalers.

🎯 Requirements

  • • 8+ years of software engineering experience with 3+ years in Go
  • • 5+ years of Site Reliability Engineering practices
  • • Proven understanding of Observability tools and practices
  • • Experience with application deployment and monitoring using Kubernetes
  • • Strong experience with modern DevOps practices

🏖️ Benefits

  • • Generous cash & equity compensation
  • • Health, dental, and vision coverage for you and your dependents
  • • Wellness and commuter stipends for select roles
  • • 401k Plan with 2% company match (USA employees)
  • • Flexible paid time off plan that we all actually use

Skills & Technologies

Kubernetes
Terraform
Linux
Prometheus
Senior
Onsite

Ready to Apply?

You will be redirected to an external site to apply.

AI Job Fit Analysis
Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

Lambda Inc. logo
Lambda Inc.
Visit Website

About Lambda Inc.

Lambda Inc. provides cloud-based GPU clusters and workstations for artificial-intelligence research and development. The company designs and operates high-performance hardware infrastructure optimized for machine-learning workloads, offering on-demand access to NVIDIA GPUs, pre-configured deep-learning software stacks, and scalable storage. Customers include AI labs, universities, and enterprises training large language and computer-vision models. Founded in 2012, Lambda is headquartered in San Francisco and maintains data centers across North America and Europe.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

Expired
Yerevan, Armenia
Full-time
Expired Jun 4, 2026
Python
Java
Go
+5 more

2 months ago

Expired
Yerevan, Armenia
Full-time
Expired Jun 4, 2026
Python
Java
Go
+6 more

2 months ago

Expired
Pragmatike Soluciones TecnolĂłgicas S.L. logo

Pragmatike Soluciones TecnolĂłgicas S.L.

Armenia
Full-time
Expired Jun 6, 2026
JavaScript
TypeScript
Rust
+4 more

2 months ago

Expired
Argentina
Full-time
Expired May 31, 2026
Azure
Remote
$40k-45k

2 months ago