
Job Overview
Location
Mumbai Metropolitan Region
Job Type
Full-time
Category
Data Engineer
Date Posted
December 5, 2025
Full Job Description
đź“‹ Description
- • Own the entire data lifecycle at Peoplegrove—from raw ingestion to analytics-ready models—so every team can make confident, data-driven decisions that improve student and mentor outcomes.
- • Architect and maintain scalable ingestion pipelines using Fivetran, pulling from SaaS tools, LMS APIs, event streams, and internal databases into Snowflake; ensure every record is traceable, versioned, and recoverable.
- • Design dimensional and star-schema data models that balance query speed with storage cost, powering executive KPIs, product dashboards, and predictive features used by thousands of universities.
- • Build curated datasets and feature stores that ML Engineers can consume via Snowpark AI/Cortex, cutting model-training time from days to hours while guaranteeing reproducibility.
- • Implement automated data-quality tests (null checks, referential integrity, distribution drift) and alerting so anomalies are caught before they reach downstream users.
- • Optimize SQL and Python ELT jobs for cost and performance—leveraging Snowflake clustering, materialized views, and incremental loads—to keep warehouse spend predictable as data volume 10×.
- • Expose clean, well-documented GraphQL endpoints that allow frontend teams to self-serve the exact slices of data they need without writing a single line of SQL.
- • Partner with Product, Customer Success, and Marketing to translate ambiguous business questions into concrete data requirements, then deliver the datasets that answer them.
- • Champion data governance: define naming conventions, catalog schemas in a living data dictionary, and enforce role-based access controls that satisfy both SOC 2 and university privacy standards.
- • Introduce CI/CD for dbt or similar transformation frameworks, enabling peer review, automated testing, and zero-downtime deployments of new models.
- • Continuously evaluate new tools—e.g., Snowflake Native Apps, Connector SDK, or emerging AI data platforms—and run proof-of-concepts that could shave hours off daily workflows.
- • Mentor junior engineers and analysts through pair programming, design reviews, and lunch-and-learns, raising the overall data literacy of the company.
- • Contribute to disaster-recovery planning: design cross-region replication, document runbooks, and lead quarterly failover drills so we can restore critical pipelines in <30 minutes.
- • Track pipeline SLAs and publish weekly reliability scorecards; when incidents occur, lead blameless post-mortems that turn outages into durable improvements.
- • Stay close to our mission: every optimization you ship helps first-generation college students find mentors, land internships, and launch careers.
🎯 Requirements
- • 4+ years of hands-on data-engineering experience building ingestion, transformation, and modeling pipelines in production.
- • Expert-level SQL and Python; proven ability to tune complex queries and orchestrate DAGs at scale.
- • Deep, practical knowledge of Snowflake including clustering, resource monitors, and secure data-sharing.
- • Production experience with Fivetran (or similar ELT tool) and GraphQL API design/consumption.
- • Familiarity with AWS or Google Cloud IAM, storage, and compute services; comfort with Infrastructure-as-Code a plus.
- • Nice-to-have: Snowpark AI/Cortex, Connector SDK development, prior work with EdTech or LMS data, and strong grasp of data-security best practices.
🏖️ Benefits
- • Fully remote-first culture—work from anywhere in the Mumbai Metropolitan Region or beyond with flexible hours and async collaboration.
- • Competitive compensation plus equity, ensuring you share directly in the growth you help create.
- • Annual learning stipend (courses, conferences, certifications) and dedicated “innovation Fridays” to explore new tools.
- • Comprehensive health coverage for you and dependents, generous PTO, and company-wide recharge weeks to protect your well-being.
Skills & Technologies
Python
AWS
GCP
GraphQL
Remote
About PeopleGrove, Inc.
PeopleGrove provides a software platform that connects students and professionals with mentors, alumni, and industry experts to support career development, networking, and skill-building. The platform integrates with existing university and corporate systems, offering tools for one-on-one mentoring, group discussions, events, and analytics to measure engagement and outcomes. Founded in 2015 and headquartered in San Francisco, California, the company serves higher education institutions, workforce development organizations, and employers seeking to scale mentorship and community-driven learning.



