This job has expired
This position was posted on October 3, 2025 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Job Overview
Location
Paris, Oregon, USA
Job Type
Full-time
Category
Software Engineering
Date Posted
October 3, 2025
Full Job Description
đź“‹ Description
- • Architect and own the end-to-end lifecycle of petabyte-scale data pipelines that ingest, transform, and load multi-modal datasets (images, text, video) into our cloud data warehouse and ML training clusters. You will design fault-tolerant, idempotent workflows that run on Kubernetes and leverage distributed frameworks such as Ray, Spark, or Dask to process millions of assets per day with sub-second latency guarantees.
- • Partner daily with research scientists to translate experimental requirements into production-grade data contracts. You will profile raw corpora, identify coverage gaps, and implement automated quality gates that detect label noise, duplication, and demographic bias before data ever reaches a GPU. Your work directly determines the fidelity and fairness of the next generation of Jasper’s generative models.
- • Build reusable, versioned datasets optimized for vision-language pre-training. This includes writing deterministic extract-transform-load (ETL) jobs that apply classical computer-vision filters (edge detection, color-space normalization, object detection) and modern foundation-model–based captioning and tagging. You will maintain a feature store that enables researchers to slice data by domain, resolution, or metadata in seconds rather than hours.
- • Continuously optimize I/O throughput and memory footprint for distributed training. You will benchmark serialization formats (Parquet, WebDataset, MDS), tune prefetching and caching layers, and implement dynamic batching strategies that keep A100 clusters at 95 %+ utilization. Your profiling dashboards will surface pipeline bottlenecks and guide investment in faster storage tiers or smarter sharding schemes.
- • Establish rigorous data governance and reproducibility standards. Every transformation will be codified in declarative DAGs (Airflow, Prefect, or Dagster), tracked with Git-based version control, and documented in an internal data catalog. You will champion unit tests for data schemas, enforce SLAs for freshness, and publish lineage graphs so any experiment can be rerun months later with identical inputs.
- • Proactively source and license new multi-modal corpora from public repositories, academic datasets, and strategic partners. You will negotiate data-sharing agreements, ensure GDPR/CCPA compliance, and build ingestion connectors that normalize metadata, de-duplicate near-identical assets, and flag restricted content. Your pipeline will automatically tag assets with provenance and usage rights to keep legal risk near zero.
- • Foster a culture of observability and continuous improvement. You will set up real-time alerts on data drift, schema evolution, and pipeline failures; run weekly blameless post-mortems; and iterate on SLIs that balance cost, latency, and accuracy. By instrumenting everything from GPU wait-times to token-level label entropy, you will give stakeholders transparent insight into the health of our data platform.
- • Mentor junior engineers and data scientists on best practices for scalable data engineering. You will lead internal workshops on PyTorch DataLoader internals, Delta Lake optimization, and cost-aware cloud resource scheduling. Your code reviews will raise the bar for readability, test coverage, and performance, ensuring that every PR moves the platform closer to exabyte readiness.
Skills & Technologies
About Jasper AI, Inc.
Jasper AI, Inc. provides a generative artificial intelligence platform that helps marketing and content teams create, edit, and optimize written and visual assets at scale. Founded in 2021, the company offers browser extensions, API integrations, and team collaboration tools that use large language models to generate blog posts, emails, ad copy, and social media content while maintaining brand voice consistency. Customers include Fortune 500 enterprises, agencies, and freelance creators seeking to accelerate production workflows and improve conversion performance across channels.
Similar Opportunities

SHI International Corp.
15 days ago

