
Job Overview
Location
India (Remote)
Job Type
Contract
Category
Data Scientist
Date Posted
February 25, 2026
Full Job Description
đź“‹ Description
- • Lilt Production is at the forefront of revolutionizing how the world communicates by making information accessible to everyone, regardless of the language they speak. We are actively developing a sophisticated and verifiable evaluation suite of Terminal-Bench tasks specifically engineered to push the boundaries of large language models (LLMs) in complex multilingual software challenges. Our overarching objective is to rigorously assess the multilingual robustness of these models, focusing on critical areas such as prompt language effects, the efficacy of non-English data processing, and the handling of intricate locale and encoding edge cases within terminal workflows.
- • To achieve this, we are seeking highly skilled and experienced native-speaking software engineers who possess a deep understanding of their respective languages and a strong technical aptitude. As a key member of our team, you will be instrumental in designing, building, and validating these advanced benchmarks. Your primary focus will be on creating high-signal, high-quality tasks that serve as genuine tests of a model's ability to navigate and perform within multilingual environments, critically ensuring these tasks do not rely on English translation crutches for their effectiveness.
- • This role involves significant "Task Engineering," where you will be responsible for evaluating the capabilities of coding agents. This means conceptualizing and defining the specific challenges these AI agents will face, ensuring they are relevant to real-world multilingual software development scenarios.
- • A core component of your work will be "Asset Creation." You will be tasked with building realistic and challenging task environments. This involves utilizing datasets and files exclusively in your native language. It is paramount that these assets remain untranslated to genuinely measure the AI's proficiency in handling native language inputs and contexts without defaulting to English.
- • "Prompting & Translation" will be another crucial area. You will actively seek out and identify failure points where AI models falter when operating in your native language. This involves creative prompting and a keen eye for linguistic nuances that might trip up an LLM.
- • Your role will extend to "Implementation & Verification." You will support the development of robust solutions, including creating reference implementations for tasks. Furthermore, you will write highly reliable and deterministic verifier scripts to objectively assess the performance of the AI models. The use of rubric-based judging will be employed only when strictly necessary, emphasizing objective, script-driven verification.
- • "Calibration & Execution" involves meticulous analysis of execution logs. You will be responsible for calibrating the difficulty of tasks, ranging from Easy to Very Hard, using standard Terminal-Bench run configurations. This calibration will be performed against various model tiers, including Haiku, Sonnet, and Opus, to understand performance across different LLM capabilities.
- • "Quality Assurance" is a non-negotiable aspect of this role. You will participate in a rigorous, multi-layered human quality control process. This includes creation review, human review of generated outputs, calibration review, and final audit. This process, combined with automated LLM-based checks, ensures the fairness, grammatical accuracy, and overall integrity of the benchmarks.
- • This is a remote, freelance opportunity, offering flexibility and the chance to contribute to groundbreaking AI research from anywhere in India. You will be working with a global community that thrives on innovation and excellence, contributing to LILT's mission to deliver multilingual AI and human-verified services to a diverse range of clients, including Enterprises, Governments, and AI Developers worldwide.
- • By joining LILT, you will have the opportunity to earn money, have fun, and advance human knowledge. You will work on diverse projects, build your professional network, and get paid quickly and fairly, all through a streamlined application process tailored to your unique expertise. We are committed to a fair, inclusive, and transparent hiring process, and while we may use AI tools to assist in evaluation, all final hiring decisions are made by people.
Skills & Technologies
Python
Remote
About Lilt Production
Lilt Production is a full-service video production studio based in Paris, France, creating commercial, corporate, and branded content for agencies and global brands. Services span concept development, live-action filming, motion graphics, post-production, color grading, and localized adaptations. The company operates a bilingual French-English team and works across Europe, the Middle East, and Africa, emphasizing cinematic storytelling and contemporary visual aesthetics for broadcast, digital, and social media distribution.
Similar Opportunities

Shift Technology SAS
Brazil - Sao Paolo
Full-time
Expires Apr 25, 2026
Data Science
Junior
Remote
11 days ago

Feedzai, Inc.
SĂŁo Paulo, Brazil
Full-time
Expires Apr 25, 2026
Python
Apache Spark
Onsite
+1 more
11 days ago

Atlas Computing Inc.
Canada
Full-time
Expires Apr 23, 2026
Python
GitHub
TensorFlow
+6 more
13 days ago
