Lilt Production logo

AI Benchmark Engineer - Native Language Specialist | Marathi

Job Overview

Location

India (Remote)

Job Type

Contract

Category

Data Scientist

Date Posted

February 25, 2026

Full Job Description

đź“‹ Description

  • • Lilt Production is at the forefront of revolutionizing how the world communicates by making information accessible to everyone, regardless of the language they speak. We are actively developing a sophisticated and verifiable evaluation suite of Terminal-Bench tasks specifically engineered to push the boundaries of large language models (LLMs) in complex multilingual software challenges. Our overarching objective is to rigorously assess the multilingual robustness of these models, focusing on critical areas such as prompt language effects, the efficacy of non-English data processing, and the handling of intricate locale and encoding edge cases within terminal workflows.
  • • To achieve this, we are seeking highly skilled and experienced native-speaking software engineers who possess a deep understanding of their respective languages and a strong technical aptitude. As a key member of our team, you will be instrumental in designing, building, and validating these advanced benchmarks. Your primary focus will be on creating high-signal, high-quality tasks that serve as genuine tests of a model's ability to navigate and perform within multilingual environments, critically ensuring these tasks do not rely on English translation crutches for their effectiveness.
  • • This role involves significant "Task Engineering," where you will be responsible for evaluating the capabilities of coding agents. This means conceptualizing and defining the specific challenges these AI agents will face, ensuring they are relevant to real-world multilingual software development scenarios.
  • • A core component of your work will be "Asset Creation." You will be tasked with building realistic and challenging task environments. This involves utilizing datasets and files exclusively in your native language. It is paramount that these assets remain untranslated to genuinely measure the AI's proficiency in handling native language inputs and contexts without defaulting to English.
  • • "Prompting & Translation" will be another crucial area. You will actively seek out and identify failure points where AI models falter when operating in your native language. This involves creative prompting and a keen eye for linguistic nuances that might trip up an LLM.
  • • Your role will extend to "Implementation & Verification." You will support the development of robust solutions, including creating reference implementations for tasks. Furthermore, you will write highly reliable and deterministic verifier scripts to objectively assess the performance of the AI models. The use of rubric-based judging will be employed only when strictly necessary, emphasizing objective, script-driven verification.
  • • "Calibration & Execution" involves meticulous analysis of execution logs. You will be responsible for calibrating the difficulty of tasks, ranging from Easy to Very Hard, using standard Terminal-Bench run configurations. This calibration will be performed against various model tiers, including Haiku, Sonnet, and Opus, to understand performance across different LLM capabilities.
  • • "Quality Assurance" is a non-negotiable aspect of this role. You will participate in a rigorous, multi-layered human quality control process. This includes creation review, human review of generated outputs, calibration review, and final audit. This process, combined with automated LLM-based checks, ensures the fairness, grammatical accuracy, and overall integrity of the benchmarks.
  • • This is a remote, freelance opportunity, offering flexibility and the chance to contribute to groundbreaking AI research from anywhere in India. You will be working with a global community that thrives on innovation and excellence, contributing to LILT's mission to deliver multilingual AI and human-verified services to a diverse range of clients, including Enterprises, Governments, and AI Developers worldwide.
  • • By joining LILT, you will have the opportunity to earn money, have fun, and advance human knowledge. You will work on diverse projects, build your professional network, and get paid quickly and fairly, all through a streamlined application process tailored to your unique expertise. We are committed to a fair, inclusive, and transparent hiring process, and while we may use AI tools to assist in evaluation, all final hiring decisions are made by people.

Skills & Technologies

Python
Remote

Ready to Apply?

You will be redirected to an external site to apply.

Lilt Production logo
Lilt Production
Visit Website

About Lilt Production

Lilt Production is a full-service video production studio based in Paris, France, creating commercial, corporate, and branded content for agencies and global brands. Services span concept development, live-action filming, motion graphics, post-production, color grading, and localized adaptations. The company operates a bilingual French-English team and works across Europe, the Middle East, and Africa, emphasizing cinematic storytelling and contemporary visual aesthetics for broadcast, digital, and social media distribution.

Similar Opportunities

Brazil - Sao Paolo
Full-time
Expires Apr 25, 2026
Data Science
Junior
Remote

11 days ago

Apply
SĂŁo Paulo, Brazil
Full-time
Expires Apr 25, 2026
Python
Apache Spark
Onsite
+1 more

11 days ago

Apply
Atlas Computing Inc. logo

Atlas Computing Inc.

Canada
Full-time
Expires Apr 23, 2026
Python
GitHub
TensorFlow
+6 more

13 days ago

Apply
Czech Republic (Remote)
Contract
Expires Apr 26, 2026
Python
Remote

10 days ago

Apply