
Job Overview
Location
Remote - United States
Job Type
Full-time
Category
Software Engineering
Date Posted
May 14, 2026
Full Job Description
đź“‹ Description
- • As a Staff Machine Learning Systems Engineer at Reddit, you will lead the development of infrastructure that powers large-scale machine learning models, directly impacting recommendations, content discovery, and user quantification across teams like Growth, Ads, Feeds, and Core ML.
- • You will design end-to-end MLOps patterns to accelerate ML engineer velocity, including data preparation, model management, and experiment tracking; develop and support a graph ML codebase and platform for scalable model iteration; collaborate on performance tuning to improve training efficiency and reduce GPU costs; optimize batch data processing using tools like Apache Beam, Spark, and Ray Data; and architect pipelines for massive graph data structures with billions of nodes and tens of billions of edges.
- • You will join Reddit’s Machine Learning Platform team, a high-impact group responsible for the foundational infrastructure enabling ML-driven products across the platform, serving over 126 million daily active users and 100,000+ active communities.
- • In this role, you will deepen your expertise in scalable ML systems, graph neural networks, distributed training, and MLOps while advocating for platform usability and reliability; you’ll have the opportunity to shape the future of ML infrastructure at one of the internet’s largest and most influential platforms.
🎯 Requirements
- • 8+ years of experience in ML infrastructure, including model training and model deployments
- • Hands-on experience with ML optimization, including memory and GPU profiling
- • Deep experience with cloud-based technologies for supporting an ML platform, including tools like GCP BigQuery, Google Cloud Storage, infrastructure-as-code (Terraform)
- • Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g. MLflow or Wandb)
- • Proficiency with the common programming languages and frameworks of ML, such as Python, PyTorch, Tensorflow, etc.
- • Deep experience working with distributed training frameworks, including Ray and Kubernetes
- • Strong focus on scalability, reliability, performance, and ease of use; strong organizational and communication skills
- • Experience working with graph databases (Neo4j, JanusGraph, TigerGraph) is a big plus
- • Experience working with graph neural networks (GNNs) and associated graph ML frameworks (PyTorch Geometric, Deep Graph Library) is a big plus
🏖️ Benefits
- • Base salary range of $230,000–$322,000 USD
- • Eligibility to receive equity in the form of restricted stock units
- • Potential eligibility for commission depending on the position offered
- • Wide range of benefits for U.S.-based employees including medical, dental, and vision insurance
- • 401(k) program with employer match
- • Generous time off for vacation and parental leave
Skills & Technologies
About Reddit Inc.
Reddit is a social media platform where users submit, vote, and comment on content organized into topic-based communities called subreddits. Founded in 2005, it offers forums for news, hobbies, advice, and discussion, enabling real-time conversations and content ranking through upvotes and downvotes. With millions of daily active users globally, Reddit hosts diverse communities moderated by volunteers, supports multimedia posts, and provides advertising and premium membership options. The platform emphasizes user anonymity, community governance, and crowdsourced information, making it a hub for niche interests, viral content, and public discourse. Reddit went public in 2024 and is headquartered in San Francisco.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.



