This job has expired

This position was posted on November 19, 2025 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Deployment DevOps Engineer

Adaptive ML SAS

Job Overview

Location

Toronto

Job Type

Full-time

Full Job Description

📋 Description

• Architect and own the end-to-end deployment lifecycle of Adaptive Engine, our flagship reinforcement-learning platform, ensuring it can be rolled out on-prem, in any major cloud (AWS, Azure, GCP), or as a managed SaaS with one-click simplicity.
• Design and maintain resilient, version-controlled Kubernetes workflows (Helm charts, ArgoCD pipelines, GitOps) that scale from single-GPU proof-of-concepts to thousand-GPU production clusters handling trillions of user-interaction records.
• Act as the first line of defense for customer escalations across North-American time zones: join live troubleshooting calls with Fortune-500 clients, reproduce issues in staging, coordinate hot-fixes, and feed learnings back into the product roadmap.
• Build hardened, auditable deployment blueprints that satisfy enterprise security and compliance mandates—think OIDC/SSO integration, network micro-segmentation, WAF rules, secrets rotation, and automated CIS-hardened images.
• Partner with Sales and Customer Success to deliver pre-sales demos, POC environments, and post-sales production cut-overs; you’ll often be the technical face of Adaptive ML during critical onboarding milestones.
• Establish 24×7 on-call rotation, incident-response runbooks, and observability dashboards (Prometheus, Grafana, Loki) that keep MTTR under 15 minutes for P1 issues and provide clear RCA reports to executives.
• Optimize data pipelines for multi-terabyte Postgres clusters and object storage, ensuring sub-second query latencies for real-time personalization while keeping storage costs predictable.
• Champion Infrastructure-as-Code best practices (Terraform, Pulumi) so every environment—from dev to prod—can be recreated identically in minutes, not days.
• Contribute code (Rust preferred) to internal tooling that automates certificate issuance, GPU driver updates, and chaos-engineering experiments that validate fault tolerance.
• Influence the product roadmap by translating field feedback into concrete DevOps features: e.g., self-healing clusters, zero-downtime upgrades, or cost-aware auto-scaling policies.
• Foster a culture of blameless post-mortems, continuous learning, and documentation that allows new team members to deploy safely on day one.
• Mentor junior engineers and create reusable modules that accelerate future customer deployments, turning one-off fixes into scalable product capabilities.

Skills & Technologies

Rust

PostgreSQL

Kubernetes

Linux

DevOps

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Adaptive ML SAS

Visit Website

About Adaptive ML SAS

Paris-based startup developing a platform that lets enterprises fine-tune and deploy large language models on their own data. The system combines reinforcement learning from human feedback, retrieval-augmented generation and automated evaluation to create specialized, privacy-preserving models that run efficiently on private clouds or on-premise hardware, targeting sectors such as finance, healthcare and legal services.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.