This job has expired

This position was posted on October 3, 2025 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Jasper AI, Inc. logo

Data Engineer

Job Overview

Location

Paris, Oregon, USA

Job Type

Full-time

Category

Software Engineering

Date Posted

October 3, 2025

Full Job Description

đź“‹ Description

  • • Architect and own the end-to-end lifecycle of petabyte-scale data pipelines that ingest, transform, and load multi-modal datasets (images, text, video) into our cloud data warehouse and ML training clusters. You will design fault-tolerant, idempotent workflows that run on Kubernetes and leverage distributed frameworks such as Ray, Spark, or Dask to process millions of assets per day with sub-second latency guarantees.
  • • Partner daily with research scientists to translate experimental requirements into production-grade data contracts. You will profile raw corpora, identify coverage gaps, and implement automated quality gates that detect label noise, duplication, and demographic bias before data ever reaches a GPU. Your work directly determines the fidelity and fairness of the next generation of Jasper’s generative models.
  • • Build reusable, versioned datasets optimized for vision-language pre-training. This includes writing deterministic extract-transform-load (ETL) jobs that apply classical computer-vision filters (edge detection, color-space normalization, object detection) and modern foundation-model–based captioning and tagging. You will maintain a feature store that enables researchers to slice data by domain, resolution, or metadata in seconds rather than hours.
  • • Continuously optimize I/O throughput and memory footprint for distributed training. You will benchmark serialization formats (Parquet, WebDataset, MDS), tune prefetching and caching layers, and implement dynamic batching strategies that keep A100 clusters at 95 %+ utilization. Your profiling dashboards will surface pipeline bottlenecks and guide investment in faster storage tiers or smarter sharding schemes.
  • • Establish rigorous data governance and reproducibility standards. Every transformation will be codified in declarative DAGs (Airflow, Prefect, or Dagster), tracked with Git-based version control, and documented in an internal data catalog. You will champion unit tests for data schemas, enforce SLAs for freshness, and publish lineage graphs so any experiment can be rerun months later with identical inputs.
  • • Proactively source and license new multi-modal corpora from public repositories, academic datasets, and strategic partners. You will negotiate data-sharing agreements, ensure GDPR/CCPA compliance, and build ingestion connectors that normalize metadata, de-duplicate near-identical assets, and flag restricted content. Your pipeline will automatically tag assets with provenance and usage rights to keep legal risk near zero.
  • • Foster a culture of observability and continuous improvement. You will set up real-time alerts on data drift, schema evolution, and pipeline failures; run weekly blameless post-mortems; and iterate on SLIs that balance cost, latency, and accuracy. By instrumenting everything from GPU wait-times to token-level label entropy, you will give stakeholders transparent insight into the health of our data platform.
  • • Mentor junior engineers and data scientists on best practices for scalable data engineering. You will lead internal workshops on PyTorch DataLoader internals, Delta Lake optimization, and cost-aware cloud resource scheduling. Your code reviews will raise the bar for readability, test coverage, and performance, ensuring that every PR moves the platform closer to exabyte readiness.

Skills & Technologies

Remote
Degree Required

Ready to Apply?

You will be redirected to an external site to apply.

Jasper AI, Inc. logo
Jasper AI, Inc.
Visit Website

About Jasper AI, Inc.

Jasper AI, Inc. provides a generative artificial intelligence platform that helps marketing and content teams create, edit, and optimize written and visual assets at scale. Founded in 2021, the company offers browser extensions, API integrations, and team collaboration tools that use large language models to generate blog posts, emails, ad copy, and social media content while maintaining brand voice consistency. Customers include Fortune 500 enterprises, agencies, and freelance creators seeking to accelerate production workflows and improve conversion performance across channels.

Similar Opportunities

Indiana, USA
Full-time
Expires Apr 13, 2026
Python
JavaScript
AWS
+3 more

1 month ago

Apply
Indiana, USA
Full-time
Expires Apr 13, 2026
Python
JavaScript
AWS
+3 more

1 month ago

Apply
SHI International Corp. logo

SHI International Corp.

Indiana, USA
Full-time
Expires Apr 29, 2026
AWS
Azure
Remote
+2 more

15 days ago

Apply
Indiana, USA
Full-time
Expires Apr 13, 2026
Remote

1 month ago

Apply