Weekday 1 logo

Staff Engineer - DevOps

Job Overview

Location

Remote

Job Type

Full-time

Category

DevOps

Date Posted

March 3, 2026

Full Job Description

📋 Description

  • • As a Staff Engineer specializing in DevOps, you will be instrumental in shaping and advancing our cloud infrastructure and operational excellence. This pivotal role involves architecting sophisticated DevOps ecosystems, driving significant cloud cost governance initiatives, and implementing cutting-edge container orchestration practices to ensure our systems are not only robust and scalable but also cost-efficient.
  • • You will collaborate closely with cross-functional teams, including engineering, security, and finance, to foster a culture of operational excellence. Your responsibilities will extend to proactively managing and optimizing infrastructure spend, ensuring that our technological investments deliver maximum value and align with our financial objectives.
  • • A core aspect of this role is leading the end-to-end DevOps strategy. This encompasses the design, implementation, and continuous improvement of CI/CD pipelines, robust automation frameworks, infrastructure-as-code principles, and efficient release engineering processes. You will be the driving force behind establishing and maintaining best practices in DevOps, setting high standards for reliability, and implementing effective operational governance across the organization.
  • • You will be tasked with designing scalable, resilient, and cloud-native architectures that are meticulously aligned with our business growth trajectory. This requires a forward-thinking approach to infrastructure planning, anticipating future needs and ensuring our systems can adapt and expand seamlessly.
  • • A significant focus will be placed on Kubernetes and containerization. You will architect and manage large-scale Kubernetes environments specifically designed for production workloads. This involves optimizing workloads across multiple clusters for peak performance, unwavering reliability, and optimal cost efficiency. You will be responsible for building and maintaining containerized applications using Docker and Kubernetes, ensuring their portability, scalability, and ease of deployment across diverse environments.
  • • Furthermore, you will drive the implementation of multi-cluster and multi-region deployments where necessary, enhancing our system's resilience and availability to meet stringent Service Level Agreements (SLAs).
  • • In the realm of cost savings and planning, you will own the critical functions of infrastructure cost visibility and optimization. This involves developing and executing comprehensive cloud cost-saving strategies, including precise rightsizing of resources, strategic reserved capacity planning, intelligent auto-scaling optimization, and efficient workload scheduling. You will work in close partnership with finance teams to contribute to budgeting, forecasting, and long-term cost planning, ensuring financial prudence in our infrastructure operations.
  • • You will be responsible for creating sophisticated dashboards and reporting mechanisms that provide clear insights into infrastructure Return on Investment (ROI) and critical spend trends. Your continuous efforts will be directed towards identifying inefficiencies and implementing measurable cost-reduction initiatives without ever compromising system performance or reliability.
  • • For monitoring and observability, you will design and implement comprehensive monitoring systems leveraging tools like Grafana and other leading observability platforms. This includes building real-time dashboards that offer a clear view of system health, performance metrics, and crucial cost insights. You will establish robust alerting frameworks designed to minimize downtime and significantly improve incident response times. Your work will directly contribute to enhancing system reliability through data-driven monitoring and thorough post-incident analysis.
  • • Automation and reliability are paramount. You will automate critical processes such as provisioning, deployments, scaling, and recovery. Your efforts will focus on improving system resilience, maximizing availability, and refining disaster recovery strategies. You will also lead root cause analysis for major incidents, ensuring that preventive measures are implemented effectively to avoid recurrence and maintain operational stability.

Skills & Technologies

AWS
Azure
GCP
Docker
Kubernetes
DevOps
Senior
Remote

Ready to Apply?

You will be redirected to an external site to apply.

About Weekday 1

Weekday 1 is a company focused on providing innovative solutions within the [Industry - e.g., technology, education, finance] sector. Their core business model revolves around [Business Model - e.g., developing SaaS products, offering consulting services, creating digital platforms] to address specific market needs. They aim to [Company Goal - e.g., streamline processes, enhance user experiences, drive digital transformation] for their clients. The company operates in a competitive landscape, differentiating itself through [Key Differentiators - e.g., proprietary technology, expert team, unique approach]. Weekday 1 is committed to [Values/Mission - e.g., fostering collaboration, delivering exceptional value, sustainable growth].

Similar Opportunities

California, USA
Full-time
Expires Apr 27, 2026
Onsite
Degree Required

12 days ago

Apply
Olix USA Inc. logo

Olix USA Inc.

Remote
Full-time
Expires May 8, 2026
Onsite

7 hours ago

Apply
Remote
Full-time
Expires May 7, 2026
Senior
Remote

1 day ago

Apply
Remote
Full-time
Expires May 8, 2026
Senior
Remote

7 hours ago

Apply