This job has expired
This position was posted on November 19, 2025 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Job Overview
Location
Toronto, Indiana, USA
Job Type
Full-time
Category
Software Engineering
Date Posted
November 19, 2025
Full Job Description
📋 Description
- • Architect and own the end-to-end deployment lifecycle of Adaptive Engine, our flagship reinforcement-learning platform, ensuring it can be rolled out on-prem, in any major cloud (AWS, Azure, GCP), or as a managed SaaS with one-click simplicity.
- • Design and maintain resilient, version-controlled Kubernetes workflows (Helm charts, ArgoCD pipelines, GitOps) that scale from single-GPU proof-of-concepts to thousand-GPU production clusters handling trillions of user-interaction records.
- • Act as the first line of defense for customer escalations across North-American time zones: join live troubleshooting calls with Fortune-500 clients, reproduce issues in staging, coordinate hot-fixes, and feed learnings back into the product roadmap.
- • Build hardened, auditable deployment blueprints that satisfy enterprise security and compliance mandates—think OIDC/SSO integration, network micro-segmentation, WAF rules, secrets rotation, and automated CIS-hardened images.
- • Partner with Sales and Customer Success to deliver pre-sales demos, POC environments, and post-sales production cut-overs; you’ll often be the technical face of Adaptive ML during critical onboarding milestones.
- • Establish 24×7 on-call rotation, incident-response runbooks, and observability dashboards (Prometheus, Grafana, Loki) that keep MTTR under 15 minutes for P1 issues and provide clear RCA reports to executives.
- • Optimize data pipelines for multi-terabyte Postgres clusters and object storage, ensuring sub-second query latencies for real-time personalization while keeping storage costs predictable.
- • Champion Infrastructure-as-Code best practices (Terraform, Pulumi) so every environment—from dev to prod—can be recreated identically in minutes, not days.
- • Contribute code (Rust preferred) to internal tooling that automates certificate issuance, GPU driver updates, and chaos-engineering experiments that validate fault tolerance.
- • Influence the product roadmap by translating field feedback into concrete DevOps features: e.g., self-healing clusters, zero-downtime upgrades, or cost-aware auto-scaling policies.
- • Foster a culture of blameless post-mortems, continuous learning, and documentation that allows new team members to deploy safely on day one.
- • Mentor junior engineers and create reusable modules that accelerate future customer deployments, turning one-off fixes into scalable product capabilities.
Skills & Technologies
About Adaptive ML SAS
Paris-based startup developing a platform that lets enterprises fine-tune and deploy large language models on their own data. The system combines reinforcement learning from human feedback, retrieval-augmented generation and automated evaluation to create specialized, privacy-preserving models that run efficiently on private clouds or on-premise hardware, targeting sectors such as finance, healthcare and legal services.
Similar Opportunities

Harris Computer Systems Corporation
5 days ago

ICF International, Inc.
5 days ago

