Lead Data Scientist - Recommendations

Scribd, Inc.

Job Overview

Location

Nice, Indiana, USA

Job Type

Full-time

Full Job Description

📋 Description

• Scribd is seeking a highly skilled and experienced Lead Data Scientist to spearhead our recommendation systems, driving measurable outcomes and shaping how millions of users discover content across our diverse platforms, including Everand, Scribd, Slideshare, and Fable.
• In this pivotal role, you will be instrumental in translating ambitious product goals into concrete metrics, defining and leading the roadmap for recommendation initiatives, and delivering significant improvements in key business results.
• You will own the entire lifecycle of recommendation model development, from defining the offline-to-online evaluation contract to designing and executing rigorous experiments, diagnosing the root causes of variant performance, and building prototype models.
• A core responsibility will be partnering closely with our Engineering teams to ensure seamless productionization of these models, bringing cutting-edge AI solutions to life.
• Your expertise will be crucial in mapping overarching product goals to specific, measurable metrics with clear success criteria, with a strong emphasis on opportunity sizing and precise measurement.
• You will apply an advanced AI lens, leveraging techniques such as Large Language Models (LLMs) and embeddings, to demonstrably enhance content retrieval, ranking accuracy, and user understanding, thereby influencing user engagement with our vast global content library.
• Scribd operates as a differentiated subscription platform, boasting strong organic reach and an extensive catalog encompassing books, audiobooks, and millions of user-generated documents and slides.
• In the rapidly evolving landscape shaped by AI, your role will be critical in helping users navigate through information overload and discover high-quality, human-centered content.
• You will be responsible for establishing strategic 'north star' metrics and essential guardrails, developing leading indicators that accurately predict long-term user outcomes, and constructing a robust measurement architecture.
• This architecture will encompass defining identity and attribution standards, establishing appropriate attribution windows, creating clear metric contracts, and implementing drift leakage checks to ensure the trustworthiness of all downstream metrics.
• Furthermore, you will accelerate decision-making velocity by defining clear stop-go criteria and implementing rigorous power checks for experiments.
• A key aspect of your role will involve compelling storytelling through concise decision memos, clearly articulating trade-offs, risks, and recommendations to stakeholders.
• **Opportunity Mapping:** You will meticulously analyze and prioritize new recommendation surfaces, user intents, and content cohorts. This involves tracing user funnels and performing detailed slice-based analysis (e.g., cold items, long-tail users, platform specifics) to strategically guide the product roadmap.
• **Evaluation Framework Ownership:** You will define the overarching 'north star' metrics and critical guardrails for recommendations, such as diversity, novelty, duplication, and safety. This includes setting performance thresholds, defining acceptable trade-offs, and publishing a clear Objective and Evaluation Contract for each recommendation surface.
• **Offline-Online Alignment:** Quantify the correlation between offline Information Retrieval (IR) metrics (e.g., NDCG@K, MAP, MRR, coverage, calibration) and online Key Performance Indicators (KPIs) across various surface cohorts. Publish error bounds and actively monitor for metric drift to ensure consistency and reliability.
• **Leading Indicator Creation:** Develop short-horizon metrics that effectively predict long-term user outcomes, such as the crucial trial-to-bill-through conversion rate. Backtest these indicators rigorously and conduct post-hoc causal checks, reporting on associated uncertainties.
• **Measurement Architecture Development:** Establish definitive identity and attribution standards (e.g., user_id vs. device_id, qualifying events, attribution windows) to ensure the integrity and trustworthiness of downstream metrics like bill-through rates and churn.
• **Advanced Experimentation:** Design and execute sophisticated experiments, including interleaving tests, pre-registering stop-go criteria, and delivering crisp, actionable readouts that directly inform product decisions.
• **Data Quality and Schema Management:** Collaborate with Analytics and Data Engineers to codify schemas, ensure data freshness, implement leakage checks, and monitor for drift, thereby establishing high-quality datasets essential for RecSys algorithms.
• **AI/ML Integration:** Evaluate the measurable impact of LLMs and embeddings (e.g., for topics, summaries, semantic similarity) on both offline and online recommendation metrics. Prototype solutions and provide clear build specifications for Machine Learning Engineers.
• **Storytelling and Influence:** Craft compelling decision memos, align diverse cross-functional teams, and drive clear, data-informed decisions by explicitly calling out trade-offs and potential risks.
• **Python and SQL Proficiency:** Demonstrate strong programming skills in Python and SQL, essential for data manipulation, analysis, and model development.
• **Spark Familiarity:** Be comfortable working with Spark for large-scale data processing and distributed computing.
• **Ranking Evaluation Expertise:** Possess fluency in key ranking evaluation metrics such as NDCG@K, MAP, MRR, calibration, and coverage, along with a solid understanding of exposure selection bias.
• **Experimentation and Metric Correlation:** Exhibit fluency in experimental design and a proven ability to connect offline metrics with online business outcomes.
• **Product Goal Translation:** Demonstrate the capacity to translate abstract product goals into concrete loss functions, features, and technical specifications that engineering teams can effectively build upon.
• **LLM/Embeddings Evaluation (Nice-to-have):** Familiarity with evaluating LLMs and embeddings in both offline and online settings, including assessing the lift versus latency/cost trade-offs of embeddings vector search.
• **Competitive Equity Ownership:** Participate in the company's success through a generous equity package.
• **Comprehensive Health Coverage:** Benefit from 100% employer-paid Medical, Dental, and Vision insurance for employees.
• **Generous Paid Time Off:** Enjoy 12 weeks of paid parental leave, plus ample Vacation & Personal Days, Paid Holidays (including a winter break), and Flexible Sick Time.
• **Professional Development:** Access a Learning & Development allowance and programs, along with an onboarding stipend for home office peripherals and accessories to support your work environment.
• **Wellness and Support:** Receive quarterly stipends for Wellness and WiFi, access to Mental Health support and resources, and a dedicated Book Benefit to encourage continuous learning.
• **Additional Perks:** Enjoy Referral Bonuses, Sabbaticals for extended breaks, company-wide events, team engagement budgets, a Volunteer Day, and access to a suite of Scribd Inc. products.

Skills & Technologies

Python

Apache Spark

Data Science

Senior

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Scribd, Inc.

Visit Website

About Scribd, Inc.

Scribd is a digital library and document-sharing platform founded in 2007. It offers a subscription service that provides access to a vast collection of e-books, audiobooks, magazines, sheet music, and documents. Users can read, listen, and download content from a wide range of genres and topics. The platform also allows users to upload and share their own documents, making it a valuable resource for students, researchers, and professionals. Scribd's mission is to make the world's knowledge accessible to everyone, fostering a culture of reading and learning through its extensive digital library and community features.

View Company Profile