
Job Overview
Location
Remote
Job Type
Full-time
Category
Software Engineering
Date Posted
June 4, 2026
Full Job Description
đź“‹ Description
- • Design, build, and operate ingestion systems that process large volumes of multimodal data—including imaging, audio, video, and text—into clean, structured, AI-ready datasets
- • Own the end-to-end ingestion path from data ingestion through validation, processing, tracking, and downstream delivery
- • Develop modality-specific processing steps such as medical imaging processing, audio/video metadata extraction, quality validation, and notes processing
- • Build parsers, validators, and normalization logic to handle messy, non-standard, and high-variance source formats across diverse industries
- • Turn repetitive, one-off data handling tasks into reusable processing patterns, internal tooling, and scalable platform capabilities
- • Build systems for high volume and high throughput, optimizing for reliability, cost, and speed using distributed and parallel compute architectures
- • Select appropriate execution models including batch processing, distributed execution, and inference-heavy compute patterns for unstructured data
- • Diagnose and resolve performance bottlenecks across ingestion and processing pipelines as data volume and modality complexity increase
- • Implement validation and quality checks to prevent bad, incomplete, or malformed data from propagating downstream
- • Handle sensitive and regulated data, including PHI, with strict security protocols and de-identification where required
- • Track data provenance, metadata, and usage constraints throughout the ingestion pipeline to ensure compliance and auditability
- • Enhance observability, debuggability, and operational reliability across the ingestion layer with robust monitoring and logging
- • Partner with product, Data Lab, and partner engineering teams to support new modalities, evolving partner requirements, and non-standard data sources
- • Work directly with external partner engineering teams to translate source-system realities into robust ingestion and processing designs
- • Surface recurring patterns in data handling and drive standardization into reusable transforms, validators, and internal tools
- • Shape how Protege handles new data types as the platform expands into more complex data environments
- • Ramp up quickly in the codebase and ship first improvements to existing pipelines within 30 days
- • Own a processing pipeline or modality end to end by 60 days, developing depth in handling one or two data types at scale
- • Operate independently by 90 days, leading design on new modalities or scaling challenges with minimal hand-holding
- • Identify and drive at least one leverage opportunity—a reusable transform, tool, or architectural improvement—worth significant investment
🎯 Requirements
- • 5+ years building and operating production backend or data systems with real experience in data processing at scale
- • Hands-on experience designing and running large-scale data pipelines
- • Strong programming skills in Python
- • Experience with distributed data processing
- • Strong proficiency with AWS
- • Comfort with messy, varied, high-volume data and high ambiguity, with a knack for finding patterns in complex environments
🏖️ Benefits
- • Work on a product built around moving and processing large volumes of data at the forefront of AI
- • Join a lean, fast-moving, high-trust team obsessed with velocity and impact
- • Shape the future of data and AI in a company backed by world-class investors
- • Engage with ambitious AI teams and partners across healthcare, media, and other high-value industries
- • Opportunity to own critical infrastructure with end-to-end responsibility and autonomy
- • Work remotely with a globally distributed team focused on high-impact outcomes
Skills & Technologies
About Protege Inc.
Protege is a career development platform that helps early-career talent connect directly with industry mentors and secure paid apprenticeships. The company partners with employers to create short-term, project-based experiences that give participants real work opportunities while companies evaluate candidates for full-time roles. Its marketplace offers mentorship, skill-building projects, and application tools designed to reduce hiring bias and widen access to competitive industries such as tech, finance, and media. Founded in 2020 and headquartered in New York City, Protege has facilitated thousands of placements and aims to replace traditional campus recruiting with scalable experiential hiring programs.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities

Workato, Inc.
4 days ago

Nebius Group N.V.
3 months ago

Deepgram Inc.
2 months ago