
Job Overview
Location
Remote US
Job Type
Full-time
Category
Software Engineer
Date Posted
March 28, 2026
Full Job Description
đź“‹ Description
- • As Principal Software Engineer, Video Engineering at Twelve Labs Inc., you will own the architecture and implementation of video processing pipelines designed specifically for AI-native video intelligence, not human playback, enabling machines to understand video like humans do through multimodal foundation models.
- • You will architect and implement end-to-end video pipelines from byte ingestion through decode, chunking, storage, retrieval, and playback, optimizing for AI model performance across batch and streaming modes while ensuring cost-efficiency and scalability at petabyte scale.
- • You will serve as the internal subject matter expert on all video engineering matters, driving technical decisions on codec selection, decode strategies, container formats, and streaming protocols while collaborating with ML, platform, and product teams to align video preprocessing with model consumption needs.
- • You will design and implement semantic and heuristic chunking strategies in partnership with ML Research Scientists, moving beyond fixed-interval splitting to scene boundary detection, shot change analysis, and content-aware segmentation that enhances downstream AI model performance.
- • You will architect low-latency streaming ingestion pipelines using HLS, DASH, LL-HLS, and WebRTC, enabling near-real-time video processing with incremental chunking and streaming decode for live and real-time ingest workflows.
- • You will design video storage tiers and retrieval patterns optimized for AI workloads, balancing hot/warm/cold access, enabling frame-level random access, and implementing cost-effective strategies for petabyte-scale video data.
- • You will ensure accurate temporal navigation and playback capabilities to support time-coded references from AI analysis results, allowing users to precisely navigate to moments identified by video understanding models.
- • You will be the internal authority on FFmpeg, libav, and related tooling, building and maintaining custom processing pipelines, filters, and integrations while driving decode efficiency through hardware acceleration (NVDEC, VA-API), pipeline parallelism, and intelligent resource allocation.
- • You will quantify and optimize cost-per-hour-of-video-processed, establishing video engineering standards, authoring reference implementations, and mentoring engineers across teams on media fundamentals, codec internals, and production-grade media pipeline practices.
- • You will work within a mission-driven, globally distributed team backed by $107M in funding from top-tier investors including NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, contributing to the advancement of multimodal AI that transforms how humans interact with and analyze media.
🎯 Requirements
- • 12+ years of software engineering experience with 7+ years focused on video/media engineering in production systems processing video at scale.
- • Deep FFmpeg expertise including libavcodec, libavformat, filter graphs, custom demuxers/decoders, and performance tuning beyond basic CLI usage.
- • Strong knowledge of codec internals (H.264/H.265 bitstream structure, AV1 adoption tradeoffs, hardware decode paths) and quality metrics (VMAF, SSIM, PSNR).
- • Fluency in streaming protocols (HLS, DASH, LL-HLS, WebRTC) with experience in live/real-time ingest pipelines.
- • Systems engineering depth in C/C++, Rust, or Go for performance-critical media code, with ability to reason about memory layout, SIMD, and GPU pipelines; Python for pipeline orchestration.
- • Experience designing video storage systems at scale, including object stores, frame-indexed access patterns, and tiered storage strategies.
- • Background in content-aware processing such as scene detection, shot boundary analysis, temporal segmentation, or perceptual quality optimization.
- • Production instincts including incident response, observability for media pipelines, debugging decode failures at scale, and handling format edge cases gracefully.
- • Strongly preferred: AI/ML integration experience working with teams consuming video frames for model training/inference, understanding how preprocessing decisions impact model quality.
🏖️ Benefits
- • Open and inclusive culture and work environment that values diverse backgrounds and experiences.
- • Opportunity to work closely with a collaborative, mission-driven team on cutting-edge AI technology at the forefront of multimodal foundation models.
- • Comprehensive health, dental, and vision benefits.
- • Extremely flexible PTO and parental leave policy, with office closed the week of Christmas and New Years.
- • Visa support provided where applicable for international candidates.
- • Opportunity to travel up to 10% of time annually for conferences, off-site meetings, and business-related events.
Skills & Technologies
About Twelve Labs Inc.
Twelve Labs builds multimodal video understanding AI. Its cloud platform transforms long-form video into vector embeddings that capture visual, audio, speech and contextual information, enabling semantic search, summarization, chaptering, moderation and analytics through a single API. Developers upload video, index it, then query in natural language or image to retrieve exact moments, generate highlights or detect unwanted content. Models are pretrained on large-scale web video, continually fine-tuned for accuracy and latency, and deployable on dedicated GPU clusters for enterprise security. Founded in 2021, the San Francisco company serves media, ed-tech, safety and e-commerce customers worldwide.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities

Web.com Group, Inc.
2 months ago

Ryzlabs Inc.
3 months ago

Anyone AI Inc.
2 months ago

Anyone AI Inc.
2 months ago