This job has expired

This position was posted on December 13, 2025 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Machine Learning Engineer - Deployments Team

Roboflow, Inc.

Job Overview

Location

Remote

Job Type

Full-time

Full Job Description

📋 Description

• Own the end-to-end lifecycle of Roboflow’s model-deployment stack, shipping code that moves computer-vision models from training notebooks to production inference in milliseconds across cloud GPUs, ARM edge devices, and browser WASM runtimes.
• Architect and harden the next generation of Roboflow Inference—our open-source runtime—adding support for new model families (YOLOv9, SAM-2, RT-DETR), quantization schemes (INT8, FP16, sparsity), and hardware accelerators (NVIDIA Jetson, Intel NPU, Apple Neural Engine).
• Build deterministic, zero-downtime release pipelines that push updated containers to 10k+ customer endpoints nightly; automate canary analysis, blue-green rollbacks, and A/B latency experiments so every deploy is boring and safe.
• Profile and shave milliseconds off cold-start latency and memory footprint; squeeze 30 % more FPS from edge devices by rewriting hot paths in Rust or CUDA kernels while keeping the Python UX delightful.
• Design multi-tenant autoscaling policies that spin up GPU nodes in under 60 seconds and spin them down just as fast, cutting cloud spend without ever dropping a customer’s traffic spike.
• Partner with Product, Support, and Solutions Engineering to turn one-off customer hacks into reusable platform features—e.g., package a customer’s custom post-processing pipeline into a pluggable Inference plugin that 200 other teams can import with one line.
• Contribute to our open-source repos (Inference, Autodistill, supervision) weekly: review PRs, triage issues, write docs, and record demos that help 1 M+ developers succeed with computer vision.
• Run weekly “office hours” for internal teams and external power users, distilling complex deployment gotchas into runbooks and sample repos that reduce time-to-value from days to minutes.
• Instrument everything—Prometheus, OpenTelemetry, custom GPU metrics—so anomalies surface before customers notice; wake up rarely, but when you do, you fix root causes, not symptoms.
• Champion security best practices: sign images, scan for CVEs, rotate secrets, and ensure SOC-2 compliance without slowing delivery velocity.
• Experiment fearlessly with bleeding-edge tech (WebGPU, TensorRT-LLM, LoRA-adapters) in 20 % time; spin the winners into default product features and sunset the rest cleanly.
• Mentor junior engineers through pair programming and design reviews; level up the entire team’s ability to ship high-quality ML systems.
• Within 30 days, lead a full release cycle, merge your first impactful PR, and identify the roadmap area you’ll own next.
• Within 60 days, solve a gnarly customer performance issue and shepherd a cross-team initiative from prototype to early-adopter rollout.
• Within 90 days, become the go-to expert on one slice of the deployment stack and kick off a mission-critical initiative that shapes Roboflow’s next chapter.

🎯 Requirements

• 5+ years shipping production-grade ML systems, including containerized model serving at scale.
• Deep, hands-on experience with PyTorch, TensorFlow, ONNX, TensorRT, or equivalent frameworks.
• Proficiency in image/video processing libraries (OpenCV, Pillow, PyAV, DeepStream) and streaming protocols (RTSP, WebRTC).
• Strong computer-science fundamentals: concurrency, distributed systems, and low-level performance tuning.
• Demonstrated ability to design scalable, observable, and secure architectures in cloud-native environments.

🏖️ Benefits

• $163 k–$182.5 k base salary, reviewed every six months to stay market-competitive.
• $4 k annual travel stipend—fly anywhere to cowork with teammates.
• $350 monthly productivity stipend for home office or co-working upgrades.
• 100 % health-insurance coverage for you, partner, and family.
• Unlimited PTO with a 2-week minimum; 12 weeks fully paid parental leave.

Skills & Technologies

GitHub

TensorFlow

PyTorch

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Roboflow, Inc.

Visit Website

About Roboflow, Inc.

Roboflow provides a cloud-based platform for computer vision teams to manage datasets, annotate images and video, train custom models, and deploy them to edge devices and production APIs. The service automates data preprocessing, augmentation, version control, and performance monitoring across YOLO, TensorFlow, PyTorch, and other frameworks, enabling developers and enterprises to accelerate vision projects from prototype to scalable applications without building infrastructure from scratch.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.