This job has expired

This position was posted on March 25, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Machine Learning Engineer, Embedding

Twelve Labs Inc.

Job Overview

Location

Seoul, South Korea

Job Type

Full-time

Full Job Description

📋 Description

• Join Twelve Labs Inc., a pioneering Deep Tech startup at the forefront of video understanding AI, and help us set the global standard for how the world interacts with and extracts value from vast amounts of video data. We are building world-class, video-specific AI models that enable powerful search, analysis, summarization, and insight generation capabilities. Our innovative solutions are already empowering major global clients, from the world's largest sports leagues to integrated control centers and leading broadcasters and studios, to deliver hyper-personalized viewing experiences, respond rapidly to critical situations, and enhance content creation for billions of viewers. As a company recognized by CB Insights for four consecutive years as a top 100 AI startup globally, and having secured over $110 million in funding from prestigious investors like NVIDIA, NEA, and Index Ventures, we offer a unique opportunity to work alongside exceptional talent on groundbreaking technology that is uniquely available through Amazon Bedrock.
• As a Machine Learning Engineer specializing in Embeddings, you will be an integral part of our Multimodal Representation Learning and Production Serving team. Your primary focus will be on training models that unify diverse modalities such as video, audio, and text into a single, cohesive embedding space. You will be responsible for the end-to-end process of transforming research breakthroughs into robust production systems that serve thousands of customers worldwide. This involves conducting experiments within large-scale distributed training environments and ensuring the seamless transition of research outcomes into real-time inference systems. Leveraging access to cutting-edge GPU resources, including NVIDIA B300, you will play a crucial role in minimizing the cycle time from research to production, driving significant technical impact within a fast-paced development environment where research findings are deployed to global customers within months. You will collaborate closely with our Research, Product, and Infrastructure teams to bring our advanced AI models to life and ensure their successful delivery to clients.
• Design and optimize large-scale distributed training pipelines for multimodal embedding models, ensuring scalability, efficiency, and robustness. This involves architecting the infrastructure, selecting appropriate training frameworks, and implementing advanced techniques to handle massive datasets and complex model architectures. You will be responsible for the entire lifecycle of model training, from data preprocessing and augmentation to hyperparameter tuning and performance monitoring, ensuring that our models achieve state-of-the-art accuracy and generalization capabilities.
• Optimize the inference performance of embedding models in production environments, focusing on maximizing throughput, minimizing latency, and ensuring cost-effectiveness. This includes exploring and implementing techniques such as model quantization, batch processing, and GPU memory optimization to achieve maximum efficiency. You will work closely with the infrastructure team to deploy and manage these optimized models, ensuring they can handle the demanding requirements of real-time applications and serve a global user base reliably.
• Design and build vector search systems and embedding serving infrastructure. This involves selecting and integrating appropriate vector databases, developing efficient indexing strategies, and creating scalable serving layers that can handle high query volumes with low latency. You will be responsible for the end-to-end architecture of these systems, ensuring they are performant, reliable, and meet the specific needs of our video understanding applications.
• Improve and automate the end-to-end ML pipeline, covering model development, training, and serving, to accelerate the transition of research results into production. This includes developing robust CI/CD practices for ML, implementing effective experiment tracking and version control, and building automated deployment and monitoring systems. Your goal will be to minimize the time from research ideation to production deployment, enabling rapid iteration and continuous improvement of our AI models.
• Address applied research problems, including data filtering and the design of evaluation metrics, to enhance model quality and user experience. You will collaborate with the research team to define key performance indicators, develop novel evaluation methodologies, and implement data strategies that improve the accuracy and relevance of our embedding models. This role requires a deep understanding of the trade-offs between model performance, data quality, and user satisfaction.
• Explore and experiment with methods to enhance development productivity by actively utilizing AI-powered development tools such as Claude and Gemini. You will investigate how these tools can be integrated into our workflows to streamline coding, debugging, and documentation processes, ultimately leading to faster development cycles and higher quality code.
• Collaborate closely with the Research, Product, and Infrastructure teams, taking ownership of the end-to-end process from model development to actual customer delivery. This cross-functional collaboration is essential for ensuring that our technical solutions align with business objectives and deliver maximum value to our clients. You will be a key liaison between different teams, facilitating communication and ensuring that all stakeholders are aligned on project goals and timelines.
• Learn and grow within a dynamic and challenging environment, contributing to the development of cutting-edge AI technology that is shaping the future of video understanding. You will have the opportunity to work with state-of-the-art hardware and software, tackle complex problems, and make a tangible impact on a global scale. The fast-paced nature of our startup environment encourages continuous learning and professional development, providing ample opportunities to expand your skillset and advance your career.

🎯 Requirements

• Experience in research or development within computer vision, natural language processing, or multimodal learning.
• Proficiency in Python and PyTorch, with experience training models in large-scale distributed environments.
• Understanding of embedding models, contrastive learning, or representation learning.
• Experience deploying and serving ML models reliably in production environments.
• Enjoyment of experimenting, learning, and problem-solving in a rapidly changing environment.

🏖️ Benefits

• Grow with a Global Team serving global B2B clients.
• Hybrid work model offering both autonomy and collaboration.
• MacBook and up to ₩700,000 in home office equipment support, with equipment upgrades every 3 years.
• Corporate card with a monthly limit of ₩600,000 for flexible use on meals, transportation, etc.
• On-site snack bar offering snacks, coffee, and fresh food.
• Two-week winter break at the end of the year.
• Annual health check-up support.
• English education program support.

Skills & Technologies

Python

Kubernetes

Onsite

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

Twelve Labs Inc.

Visit Website

About Twelve Labs Inc.

Twelve Labs builds multimodal video understanding AI. Its cloud platform transforms long-form video into vector embeddings that capture visual, audio, speech and contextual information, enabling semantic search, summarization, chaptering, moderation and analytics through a single API. Developers upload video, index it, then query in natural language or image to retrieve exact moments, generate highlights or detect unwanted content. Models are pretrained on large-scale web video, continually fine-tuned for accuracy and latency, and deployable on dedicated GPU clusters for enterprise security. Founded in 2021, the San Francisco company serves media, ed-tech, safety and e-commerce customers worldwide.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.