This job has expired

This position was posted on February 17, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Member of Technical Staff - Alignment Lead

ReflectionAI Inc.

Job Overview

Location

Remote

Job Type

Full-time

Full Job Description

📋 Description

• As a Member of Technical Staff - Alignment Lead at ReflectionAI Inc., you will be at the forefront of developing open superintelligence, driving the critical alignment stack to ensure our advanced AI models are not only powerful but also safe, factual, and aligned with human intent. This is a unique opportunity to shape the future of AI by contributing to a mission-driven company that believes in making cutting-edge intelligence accessible to everyone.
• You will own and lead the entire alignment process, encompassing instruction tuning, Reinforcement Learning from Human Feedback (RLHF), and Reinforcement Learning from AI Feedback (RLAIF). Your expertise will be crucial in guiding our models towards achieving high factual accuracy and demonstrating robust instruction following capabilities, ensuring they perform as intended across a wide range of tasks.
• A core aspect of this role involves pioneering research into next-generation reward models and optimization objectives. You will design and implement novel approaches that significantly enhance our models' performance based on human preferences, pushing the boundaries of what's possible in AI alignment.
• Curating and generating high-quality training data is paramount. You will be responsible for sourcing and developing datasets that address complex reasoning challenges and behavioral gaps in our models. This includes designing and implementing synthetic data pipelines to fill these critical areas, ensuring comprehensive model development.
• You will optimize large-scale Reinforcement Learning (RL) pipelines, focusing on stability and efficiency. This optimization is key to enabling rapid iteration cycles, allowing us to quickly test and implement model improvements and innovations.
• Close collaboration is essential. You will work hand-in-hand with our pre-training and evaluation teams to establish tight feedback loops. This integrated approach ensures that insights from alignment research are effectively translated into generalizable gains across our foundational models.
• This role demands a deep technical understanding of alignment methodologies such as Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and rejection sampling. You will apply this knowledge to large-scale models, navigating the complexities of distributed systems and large codebases.
• You will be instrumental in improving model behavior through innovative data strategies, sophisticated reward modeling, and advanced RL techniques. Your contributions will directly impact the quality and reliability of our AI systems.
• We are looking for individuals who can demonstrate a track record of owning ambitious research or engineering agendas that have resulted in measurable model improvements. Your ability to drive projects from conception to successful implementation will be highly valued.
• The ideal candidate thrives in a fast-paced, high-agency startup environment where a bias toward action is essential. You should be passionate about advancing the frontier of artificial intelligence and contributing to a company that is building from the ground up.
• Joining ReflectionAI means being part of a small, talent-dense team dedicated to building open foundational models. You will have a significant influence on our company's direction and the evolution of open AI.
• We are committed to enabling you to do the most impactful work of your career, supported by comprehensive benefits and a culture that values your well-being and that of your loved ones.

🎯 Requirements

• Graduate degree (MS or PhD) in Computer Science, Machine Learning, or a related quantitative discipline.
• Deep technical command of alignment methodologies (e.g., PPO, DPO, rejection sampling) and proven experience scaling these techniques to large language models.
• Strong software engineering skills, with demonstrated experience diving into complex ML codebases and distributed systems.
• Prior experience in improving model behavior through data curation, reward modeling, or reinforcement learning techniques.
• Evidence of owning ambitious research or engineering projects that resulted in measurable model improvements.

🏖️ Benefits

• Top-tier compensation, including salary and equity designed to attract and retain global talent.
• Comprehensive health and wellness benefits: medical, dental, vision, life, and disability insurance.
• Fully paid parental leave for all new parents, including support for adoptive and surrogate journeys, and financial assistance for family planning.
• Generous paid time off (PTO) and relocation support.
• Opportunities for team connection through provided daily lunches and dinners, regular off-sites, and team celebrations.

Skills & Technologies

Senior

Remote

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

ReflectionAI Inc.

Visit Website

About ReflectionAI Inc.

ReflectionAI builds autonomous AI agents for enterprise process automation. The platform lets organizations create, deploy, and manage software agents that observe workflows, make decisions, and act across internal systems. Using reinforcement learning and large language models, agents learn from human guidance and adapt to changing environments. Customers use the technology for customer support triage, IT operations, compliance monitoring, and sales process automation, reducing repetitive manual tasks. The company offers cloud-hosted and on-premise deployments, role-based access controls, audit trails, and integrations with common business applications including Salesforce, ServiceNow, Jira, and Slack.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.