DevOps Engineer

Level AI Inc.

Job Overview

Location

Noida

Job Type

Full-time

Full Job Description

📋 Description

• Architect and own the end-to-end lifecycle of Level AI’s cloud-native infrastructure, powering the most adaptive AI platform in the contact-center industry. You will design, build, and continuously enhance state-of-the-art machine-learning systems that run both in Google Cloud Platform and on-prem clusters, ensuring every model we ship can be created, trained, and deployed at petabyte scale with sub-second latency.
• Build rock-solid CI/CD pipelines that move code from a developer’s laptop to production in minutes, not hours. Using Jenkins, GitHub Actions, and ArgoCD, you will automate build, test, security-scanning, and release steps for dozens of microservices, while guaranteeing zero-downtime blue-green and canary deployments.
• Create and maintain deployment manifests for every microservice with HELM, templating configurations so that environments from dev to staging to prod are reproducible, auditable, and rollback-ready. You will version every change in Git, enforce semantic versioning, and integrate Helm secrets management for safe credential handling.
• Design infrastructure-as-code with Terraform, codifying networks, IAM roles, GKE clusters, Cloud SQL instances, Pub/Sub topics, and VPCs. Your modules will be reusable, unit-tested with Terratest, and peer-reviewed so that spinning up an entire new region takes a single terraform apply.
• Implement autoscaling strategies on Kubernetes that keep inference latency low while controlling cost. You will tune Horizontal Pod Autoscalers (HPA), Vertical Pod Autoscalers (VPA), and Cluster Autoscalers, leveraging custom metrics from Prometheus and Datadog to scale GPU-backed pods for LLM workloads.
• Build observability that turns terabytes of logs, traces, and metrics into actionable insights. Using Prometheus, Grafana, Loki, Promtail, and Datadog you will craft dashboards that surface golden signals (latency, traffic, errors, saturation) and set smart, noise-free alerts that wake the right person at 3 a.m.—but only when it really matters.
• Champion security and compliance across the stack. You will bake CIS benchmarks into base images, enforce OPA/Gatekeeper policies in Kubernetes, rotate secrets with Google Secret Manager, and run automated vulnerability scans on every build so that SOC-2 and ISO-27001 auditors smile instead of frown.
• Partner daily with AI researchers, data engineers, and backend developers to translate experimental notebooks into hardened production services. You will containerize training jobs, wire them to distributed storage (GCS + NFS), and schedule them with Kubeflow Pipelines or Argo Workflows, cutting model iteration time from weeks to days.
• Drive incident response and blameless post-mortems. When an alert fires, you will jump on bridges, lead root-cause analysis, patch code or config, and then automate the fix so it never happens again. Your runbooks and chaos-engineering drills will make Level AI antifragile.
• Evaluate bleeding-edge DevOps and LLMOps tooling—think vector databases, serverless GPUs, policy-as-code, or confidential computing—and run proof-of-concepts that keep us two steps ahead of the market. Your recommendations will directly influence our technical roadmap and Series C growth trajectory.
• Mentor junior engineers and write internal blogs, brown-bag sessions, and documentation that raise the DevOps bar across the company. Your code reviews will be legendary for clarity, empathy, and the occasional dad joke.

🎯 Requirements

• Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
• 2–4 years of hands-on DevOps or SRE experience delivering highly available, large-scale services on Google Cloud Platform
• Expert-level skills with Docker, Kubernetes, HELM, and Jenkins; Terraform IaC mastery is mandatory
• Proven ability to design and operate observability stacks (Prometheus, Grafana, Loki, Datadog) and autoscale workloads via HPA
• Nice to have: LLMOps or MLOps exposure, experience with GPU node pools, Kubeflow, or policy-as-code (OPA/Gatekeeper)

🏖️ Benefits

• Market-leading compensation benchmarked to Silicon Valley standards and tailored to your skill level
• Hybrid work culture with a modern Noida office and flexible WFH days
• Work on cutting-edge AI products that augment—not replace—humans, directly shaping the future of customer experience
• Fast-growing Series C startup backed by tier-1 VCs, offering steep career growth and stock option upside

Skills & Technologies

GCP

Docker

Kubernetes

Terraform

Jenkins

DevOps

Hybrid

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Level AI Inc.

Visit Website

About Level AI Inc.

Level AI delivers an end-to-end contact center platform powered by cutting-edge generative AI, revolutionizing customer experiences and operational efficiency. Their comprehensive suite provides AI Virtual Agents, 100% Auto-QA, real-time agent assistance, and actionable customer insights for contact center leaders, agents, and CX teams. Serving diverse industries including financial services, healthcare, and retail, Level AI empowers businesses to build customer-obsessed operations globally. By automating workflows and delivering real-time intelligence, the platform enables teams to focus on exceptional service, validated by customer outcomes such as a 25% increase in CSAT and 90% time saved in QA monitoring. Level AI has also been recognized as a Gartner Cool Vendor in Customer Service & Support Technology.

View Company Profile