This job has expired
This position was posted on October 27, 2025 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Job Overview
Location
Remote
Job Type
Full-time
Category
Data Science
Date Posted
October 27, 2025
Full Job Description
đź“‹ Description
- • Own the full AI data lifecycle for Fortune-500 and cutting-edge research partners: from raw, messy enterprise data to production-grade, model-ready datasets that power mission-critical AI systems.
- • Architect and implement end-to-end ML pipelines that combine programmatic labeling, weak supervision, active learning, and human-in-the-loop review to turn subject-matter-expert knowledge into high-quality training data at unprecedented speed and scale.
- • Translate ambiguous business objectives into crisp technical specifications, then deliver against them by writing production Python, SQL, and PySpark code that runs on petabyte-scale data lakes and cloud-native infrastructure (AWS, GCP, Azure).
- • Design statistically rigorous evaluation frameworks—precision-recall curves, confidence intervals, error analysis dashboards—that give executives and ML engineers the evidence they need to trust the data and the models built on top of it.
- • Build reusable, Snorkel-internal libraries and micro-services that accelerate future engagements: think auto-tuned labeling functions, active-learning samplers, or real-time data-quality monitors that cut weeks of manual work down to hours.
- • Collaborate daily with a cross-functional squad of data engineers, ML researchers, product managers, and customer SMEs to unblock bottlenecks, prioritize features, and ship incremental value every sprint.
- • Present findings and recommendations to C-suite stakeholders and technical leads through clear slide decks, live notebooks, and interactive dashboards that turn complex data stories into actionable next steps.
- • Continuously improve our human-in-the-loop processes by running A/B tests on reviewer instructions, UI/UX tweaks, and ML-assisted pre-labeling to maximize throughput without sacrificing quality.
- • Contribute to Snorkel’s open-source ecosystem and research publications, sharing novel techniques in weak supervision, data-centric AI, and evaluation methodology with the broader community.
- • Travel up to 20 % for quarterly on-site workshops with strategic customers, where you’ll whiteboard solutions, mentor client data teams, and gather feedback that directly shapes our product roadmap.
- • Stay on the bleeding edge of generative-AI and LLM tooling, experimenting with prompt engineering, retrieval-augmented generation, and synthetic data techniques to keep our delivery toolkit state-of-the-art.
- • Champion a culture of reproducibility and engineering excellence: peer-review PRs, enforce test coverage, instrument observability, and document tribal knowledge so every project is a stepping-stone—not a one-off.
Skills & Technologies
About Snorkel AI, Inc.
Snorkel AI provides a programmatic data-labeling platform that enables enterprises to build, curate and manage training data for machine-learning models at scale. Founded in 2019 by researchers from the Stanford AI Lab, the company commercializes the Snorkel framework, replacing manual annotation with weak supervision, labeling functions and AI-assisted iteration. Its flagship product, Snorkel Flow, offers an integrated development environment where subject-matter experts write rules, models auto-label data, and teams continuously refine datasets to accelerate AI deployment in industries including financial services, healthcare and government.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities

Nory AI, Inc.
2 months ago

Valnet Inc.
4 months ago

