This job has expired

This position was posted on February 25, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

AI Benchmark Engineer - Native Language Specialist | Hausa

Lilt Production

Job Overview

Location

Nigeria (Remote)

Job Type

Contract

Full Job Description

📋 Description

• Lilt is at the forefront of transforming global communication through AI, and we are seeking a highly skilled and experienced AI Benchmark Engineer with native proficiency in Hausa to join our innovative team. This is a unique, remote, freelance opportunity to contribute to the development of a cutting-edge evaluation suite for large language models (LLMs). You will play a pivotal role in designing, building, and validating rigorous, verifiable Terminal-Bench tasks that push the boundaries of LLM capabilities in multilingual software challenges. Our mission is to accurately measure the robustness of LLMs across various linguistic and technical complexities, including prompt language effects, non-English data processing, and intricate locale/encoding edge cases within terminal workflows.
• As a Native Language Specialist for Hausa, your primary responsibility will be to engineer high-signal, high-quality tasks that genuinely assess an LLM's ability to handle multilingual environments without relying on English as a crutch. This involves a deep dive into the nuances of the Hausa language and its application in software and technical contexts. You will be instrumental in identifying failure points where AI models falter when interacting with your native language, ensuring our benchmarks are comprehensive and accurate.
• Your day-to-day activities will encompass a range of critical engineering tasks. You will be responsible for **Task Engineering**, specifically focusing on evaluating the performance of coding agents. This involves understanding how these agents interpret and execute commands and code snippets in a non-English context.
• A significant part of your role will be **Asset Creation**. You will build realistic and challenging task environments using datasets, files, and code examples exclusively in Hausa. It is crucial that these assets remain in the target language to provide a genuine test of the LLM's multilingual handling capabilities. This requires a keen eye for detail and a deep understanding of how language is used in technical documentation and software interfaces.
• You will engage in **Prompting & Translation** activities, not in the traditional sense of translating content, but in identifying subtle and overt failure points where AI models do not perform as expected when given prompts or data in Hausa. This requires a creative and analytical approach to uncover the limitations of current LLM technology.
• Your expertise will be vital in **Implementation & Verification**. You will support the development of robust reference implementations for the tasks you engineer. Furthermore, you will write highly reliable and deterministic verifier scripts to automatically assess the correctness of LLM outputs. The use of rubric-based judging will be reserved only for situations where automated verification is strictly impossible, ensuring objectivity and consistency.
• You will also be involved in **Calibration & Execution**. This involves analyzing execution logs from LLM runs and calibrating task difficulty, ranging from 'Easy' to 'Very Hard', using standard Terminal-Bench configurations. You will test against various model tiers, such as Haiku, Sonnet, and Opus, to understand performance differences and identify areas for improvement.
• Ensuring the integrity of our benchmarks is paramount. You will participate in a rigorous, **Quality Assurance** process that includes four layers of human quality control: creation review, human review, calibration review, and audit. This process is complemented by automated LLM-based checks to guarantee fairness, grammatical accuracy, and overall benchmark integrity. Your native Hausa expertise will be invaluable in this multi-layered quality control process.
• This role demands a strong technical foundation in software engineering, coupled with a profound understanding of the intricacies of multilingual text processing. You will leverage your skills in Python, shell scripting, and data processing to build and refine these benchmarks. Familiarity with Terminal/CLI-based development workflows and coding agents is essential. Your deep technical understanding of multilingual text processing pitfalls, including encoding/decoding robustness, Unicode normalization, locale-dependent conventions, text I/O, toolchain interoperability, and safe string operations, will be critical. For Hausa, this may also extend to understanding specific linguistic nuances relevant to digital environments.
• By joining Lilt, you become part of a global community dedicated to making the world's information accessible to everyone, regardless of the language they speak. You will earn money, have fun, and advance human knowledge by working on diverse, impactful projects from anywhere, anytime. We offer quick and fair payment, and the opportunity to build your professional network within a supportive community, all facilitated by a streamlined application process tailored to your expertise.

Skills & Technologies

Python

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Lilt Production

Visit Website

About Lilt Production

Lilt Production is a full-service video production studio based in Paris, France, creating commercial, corporate, and branded content for agencies and global brands. Services span concept development, live-action filming, motion graphics, post-production, color grading, and localized adaptations. The company operates a bilingual French-English team and works across Europe, the Middle East, and Africa, emphasizing cinematic storytelling and contemporary visual aesthetics for broadcast, digital, and social media distribution.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.