
Job Overview
Location
San Francisco
Job Type
Full-time
Category
Software Engineering
Date Posted
June 4, 2026
Full Job Description
đź“‹ Description
- • Own the end-to-end product strategy and roadmap for Marengo, Twelve Labs’ multimodal video embedding model, and its associated Search product, determining what features to build, defer, or deprecate.
- • Partner directly with the Marengo research team to define evaluation rubrics, guide training data investments, and determine model release readiness based on empirical quality metrics.
- • Collaborate with GTM teams to plan, execute, and monitor product launches, ensuring alignment between technical capabilities and market demand.
- • Engage directly with customers and field engineers to identify real-world failures in video retrieval, document production edge cases, and anticipate future needs six months ahead.
- • Define and enforce the quality bar for retrieval performance across all deployment environments, including managed SaaS, customer-hosted instances, and AWS Bedrock.
- • Own the architecture and deployment strategy for embeddings and search APIs across multiple infrastructure configurations, ensuring consistency, scalability, and reliability.
- • Analyze competitive landscape developments in multimodal video retrieval and adjust product direction to maintain Twelve Labs’ technological leadership.
- • Balance deep technical collaboration with researchers on retrieval architecture tradeoffs and strategic communication with business teams on product prioritization and go-to-market execution.
- • Translate customer feedback from enterprise and PLG (Product-Led Growth) segments into actionable product requirements, distinguishing between human and agent use cases.
- • Use current production pain points and usage patterns as the foundation for roadmap decisions, not merely as a backlog of feature requests.
- • Ensure product releases are evaluated against rigorous, data-driven criteria for ranking quality, latency, and relevance across diverse video content types.
- • Maintain awareness of operational costs and infrastructure demands of running multimodal models at scale, influencing product decisions around efficiency and deployment.
- • Work in a hybrid model requiring two days per week onsite in San Francisco, with daily collaboration expected with the Seoul research team, necessitating availability until approximately 8pm PT on weekdays (Fridays excluded).
🎯 Requirements
- • Background in research, ML, or engineering with hands-on experience in retrieval, embeddings, vector search, or multimodal models, transitioning into product due to focus on what gets built and why
- • Proven experience as a senior solutions engineer or forward-deployed engineer with deep ML understanding, having acted as de facto product owner on complex customer problems
- • Ability to conduct substantive technical discussions on retrieval architecture with researchers and translate those insights into product decisions for GTM teams
- • Demonstrated track record of shipping products with strong enterprise and PLG (Product-Led Growth) motions
- • Strong, evidence-based opinions on what makes search work in production, grounded in real-world data rather than intuition
- • Experience extrapolating customer needs beyond immediate requests to shape future roadmap priorities
🏖️ Benefits
- • Full health, dental, and vision benefits
- • Extremely flexible PTO and parental leave policy, with office closed the week of Christmas and New Years
- • Open and inclusive culture and work environment
- • Opportunity to work closely with a collaborative, mission-driven team on cutting-edge AI technology
Skills & Technologies
About Twelve Labs Inc.
Twelve Labs builds multimodal video understanding AI. Its cloud platform transforms long-form video into vector embeddings that capture visual, audio, speech and contextual information, enabling semantic search, summarization, chaptering, moderation and analytics through a single API. Developers upload video, index it, then query in natural language or image to retrieve exact moments, generate highlights or detect unwanted content. Models are pretrained on large-scale web video, continually fine-tuned for accuracy and latency, and deployable on dedicated GPU clusters for enterprise security. Founded in 2021, the San Francisco company serves media, ed-tech, safety and e-commerce customers worldwide.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.



