
Job Overview
Location
Remote - United States
Job Type
Full-time
Category
Software Engineering
Date Posted
May 14, 2026
Full Job Description
📋 Description
- • As a Senior Machine Learning Engineer on Reddit's ML Training Platform team, you will architect, implement, and maintain foundational ML infrastructure that powers recommendations, content discovery, and user quantification, directly impacting Growth, Ads, Feeds, and Core ML teams.
- • You will lead the building, testing, and maintenance of ML training infrastructure, design and optimize large-scale ML workflows, evolve the MLE experience through self-service GPU environments and on-demand training, write custom Kubernetes Controllers and Operators for Jupyter workspaces and ML jobs, ensure efficient GPU access via collaboration with compute teams, and improve developer experience by reducing friction in the Idea-to-Prototype loop through user research and standardized environments.
- • The Machine Learning Platform team at Reddit owns the infrastructure powering recommendations, content discovery, and quantification, serving over 126 million daily active unique visitors across 100,000+ active communities, with a mission to bring community and belonging to everyone in the world.
- • You will deepen your expertise in Kubernetes operators, GPU orchestration, distributed training frameworks (Ray, Kubernetes), and cloud-based ML platforms (AWS, GCP, Vertex AI, SageMaker), while advocating for platform users and shaping the ML development lifecycle through scalable, reliable, and performant systems.
🎯 Requirements
- • 5+ years of software engineering experience with a focus on Platform Engineering, ML Infrastructure, or Backend Systems
- • Deep Kubernetes expertise including CRDs, Controllers, and the Operator pattern beyond basic pod deployment
- • Proficiency in Python for ML ecosystem and Go for Kubernetes controllers/infrastructure tooling
- • Hands-on experience with CUDA environments, GPU virtualization/containerization within Kubernetes
- • Familiarity with managed ML offerings (Vertex AI, SageMaker) and building custom ML components in AWS and/or GCP
- • Experience with distributed training frameworks including Ray and Kubernetes
🏖️ Benefits
- • Comprehensive Healthcare Benefits and Income Replacement Programs
- • 401k Match
- • Family Planning Support
- • Gender-Affirming Care
- • Mental Health & Coaching Benefits
- • Flexible Vacation & Reddit Global Days off
- • Generous paid Parental Leave
- • Paid Volunteer time off
Skills & Technologies
About Reddit Inc.
Reddit is a social media platform where users submit, vote, and comment on content organized into topic-based communities called subreddits. Founded in 2005, it offers forums for news, hobbies, advice, and discussion, enabling real-time conversations and content ranking through upvotes and downvotes. With millions of daily active users globally, Reddit hosts diverse communities moderated by volunteers, supports multimedia posts, and provides advertising and premium membership options. The platform emphasizes user anonymity, community governance, and crowdsourced information, making it a hub for niche interests, viral content, and public discourse. Reddit went public in 2024 and is headquartered in San Francisco.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.



