This job has expired

This position was posted on December 11, 2025 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Synthetic Population Engineer

Epistemix, Inc.

Job Overview

Location

Pittsburgh

Job Type

Full-time

Full Job Description

📋 Description

• Own the continuous evolution of the Epistemix synthetic population—a living, statistically faithful mirror of every person, household, workplace, school, and social network in the United States and, soon, the world. You will ingest terabytes of public and proprietary data, fuse them with privacy-preserving techniques, and release datasets that let hospitals, insurers, and governments answer questions they could never ask with raw data.
• Hunt for new empirical datasets (census microdata, GPS traces, credit-card spend, satellite imagery, mobility pings, building footprints, school district shapefiles, etc.) and design ingestion pipelines that enrich the synthetic population with new individual attributes (health status, income volatility, commute mode, vaccine hesitancy) and high-resolution social networks (family ties, co-worker graphs, after-school clubs). Every new column you add unlocks a fresh customer use case and revenue stream.
• Leverage agent-based models (ABMs) and microsimulation to project demographic futures—aging, migration, fertility, mortality, and labor-force shifts—five, ten, and twenty years ahead. Your forecasts become the scaffolding for customer scenario planning in insurance pricing, Medicaid expansion, and pandemic response.
• Elevate geographical realism: ensure that every synthetic home sits on a plausible parcel, every workplace clusters near transit corridors, and every school draws students from realistic catchments. You will conflate road networks, zoning layers, POI databases, and parcel geometries so that downstream models inherit spatial credibility.
• Scale the geographic footprint from the current 50-state U.S. model to a seamless global synthetic population, harmonizing country-specific census schemas, administrative boundaries, and cultural definitions of households. You decide how to stitch France’s commune system to Germany’s Gemeinde while preserving statistical coherence.
• Craft beautiful, intuitive visualizations—dot-density maps, time-series dashboards, interactive network graphs—that turn abstract statistical populations into persuasive sales collateral and self-service exploration tools for non-technical users.
• Build open-source-style extension frameworks that let external data owners append proprietary customer lists, loyalty-card segments, or disease registries to the synthetic population without ever exposing PII. Your SDKs and APIs become the connective tissue of an expanding data ecosystem.
• Negotiate and integrate third-party data marketplaces (SafeGraph, Veraset, Carto, Experian, etc.), evaluating coverage, bias, and cost so that Epistemix always offers the richest, most defensible synthetic data on the market.
• Collaborate daily with Customer Success, Professional Services, and Engineering to translate project-specific requirements—"we need 10 000 type-2 diabetics in Cook County with household income under 400 % FPL"—into parameterized population slices delivered in hours, not weeks.
• Communicate the provenance story behind every synthetic attribute: which raw sources, which imputation models, which validation checks. Your lucid explanations build trust with regulators, academics, and Fortune 500 executives alike.
• Champion a culture of reproducible science: containerized workflows, versioned datasets, automated validation suites, and peer-reviewed white papers that establish Epistemix as the global standard for synthetic populations.

🎯 Requirements

• 3+ years using Python for large-scale data science (pandas, NumPy, scikit-learn, PySpark, or Dask) and shipping production-grade code
• Demonstrated experience with geospatial data (PostGIS, GeoPandas, rasterio, QGIS) and spatial indexing at the 100M+ record scale
• Advanced SQL (PostgreSQL preferred) including query optimization, indexing strategies, and ETL pipeline design
• PhD or Master’s in Data Science, Computer Science, Statistics, Epidemiology, Public Health, or a related quantitative discipline
• Nice-to-have: published research or commercial experience with agent-based modeling, microsimulation, or population projection methods

🏖️ Benefits

• Equity & Incentives – meaningful stock-option participation in a Series-A company poised for global scale
• Flexible Time Off – fully remote, asynchronous culture with autonomy to balance life and work across UTC-8 to UTC+1
• Health & Welfare Program – comprehensive medical, dental, and vision coverage for U.S. employees (comparable stipend for international hires)
• Meaningful Impact – apply your talents to real-world problems in public health, climate resilience, and social equity

Skills & Technologies

Python

PostgreSQL

Remote

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

Epistemix, Inc.

Visit Website

About Epistemix, Inc.

Epistemix builds computational models that simulate the spread of disease, behavior, and policy impacts across populations. Its platform combines demographic, health, and mobility data to forecast outcomes for governments, businesses, and health organizations, enabling scenario testing for pandemics, chronic conditions, and social interventions. The company was spun out of the University of Pittsburgh and is headquartered in Pittsburgh, Pennsylvania.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.