
Job Overview
Location
Remote
Job Type
Full-time
Category
Data Engineer
Date Posted
February 17, 2026
Full Job Description
đź“‹ Description
- • At ReflectionAI Inc., we are on a mission to build open superintelligence and make it accessible to all. We are developing cutting-edge open-weight models designed for a diverse range of users, from individuals and agents to enterprises and even nation-states. Our team is comprised of distinguished AI researchers and seasoned company builders, with individuals hailing from prestigious institutions and companies such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, and Anthropic.
- • In the rapidly evolving landscape of Artificial Intelligence, data has emerged as a cornerstone of innovation. Many of the most significant breakthroughs in recent years have stemmed not from novel architectures, but from the strategic utilization of superior data. As a vital member of our Data Team, your primary mission will be to ensure that the data used for training our foundational models adheres to the highest standards of quality, reliability, and demonstrable downstream impact. Your contributions will directly influence and shape the performance of our models across critical capabilities.
- • You will collaborate closely with our world-class researchers on the pre-training teams. Your role will involve transforming abstract concepts of 'good data' into concrete, quantifiable standards that can be effectively scaled across extensive data campaigns. We are seeking engineers who possess robust engineering fundamentals coupled with a profound curiosity about data quality and its direct correlation with model performance.
- • Working in close partnership with our pre-training teams, you will take ownership of upstream data quality for Large Language Model (LLM) pre-training. This role can be approached as a specialist or a generalist, encompassing various languages and modalities. You will be instrumental in translating research and pre-training team requirements into measurable quality signals and providing actionable, constructive feedback to external data vendors.
- • Beyond human-in-the-loop processes, you will be responsible for designing, validating, and scaling automated Quality Assurance (QA) methods. These methods will be crucial for reliably measuring data quality across large-scale data campaigns, ensuring consistency and accuracy.
- • A key aspect of your role will be to build reusable QA pipelines. These pipelines will ensure the consistent delivery of high-quality data to the pre-training teams, thereby optimizing the model training process.
- • You will continuously monitor and report on data quality trends over time. This ongoing analysis will drive iterative improvements in our quality standards, processes, and acceptance criteria, fostering a culture of continuous enhancement.
- • This role offers a unique opportunity to be at the forefront of AI development, directly impacting the capabilities of foundational models. You will gain invaluable experience working with leading researchers and engineers in a fast-paced, innovative environment.
- • The ideal candidate will possess strong engineering fundamentals, with demonstrable experience in building data pipelines, QA systems, or evaluation workflows specifically for pre-training data. A keen eye for detail and an analytical mindset are essential, enabling you to identify subtle inconsistencies, failure modes, and other issues that can compromise data quality.
- • A solid understanding of how data quality influences pre-training outcomes is paramount. You must be adept at translating abstract quality concerns into concrete, measurable signals, informed decisions, and precise feedback.
- • Experience in designing and validating automated quality checks is highly valued. This includes proficiency with rule-based systems, statistical methods, or model-assisted approaches, such as leveraging LLMs for evaluation (LLM-as-a-Judge).
- • We are looking for individuals who are comfortable working autonomously, taking full ownership of problems from inception to resolution, and collaborating effectively with researchers, engineers, and operations partners across different teams.
- • This position is fully remote, offering flexibility and the opportunity to work with a globally distributed, talent-dense team.
🎯 Requirements
- • Proficiency in Python and experience building ML/LLM workflows, including debugging and writing scalable code.
- • Experience working with large datasets and implementing automated evaluation or quality-checking systems.
- • Solid understanding of how data quality impacts LLM pre-training and the ability to translate quality concerns into actionable signals and feedback.
- • Experience designing and validating automated quality checks (e.g., rule-based, statistical, LLM-as-a-Judge).
🏖️ Benefits
- • Top-tier compensation, including salary and equity designed to attract and retain top global talent.
- • Comprehensive health and wellness benefits: medical, dental, vision, life, and disability insurance.
- • Fully paid parental leave for all new parents, with financial support for family planning.
- • Generous paid time off (PTO) and relocation support.
Skills & Technologies
About ReflectionAI Inc.
ReflectionAI builds autonomous AI agents for enterprise process automation. The platform lets organizations create, deploy, and manage software agents that observe workflows, make decisions, and act across internal systems. Using reinforcement learning and large language models, agents learn from human guidance and adapt to changing environments. Customers use the technology for customer support triage, IT operations, compliance monitoring, and sales process automation, reducing repetitive manual tasks. The company offers cloud-hosted and on-premise deployments, role-based access controls, audit trails, and integrations with common business applications including Salesforce, ServiceNow, Jira, and Slack.



