Research Engineer/Scientist - Human Alignment, Consumer Devices

OpenAI, Inc.

Job Overview

Location

San Francisco

Job Type

Full-time

Full Job Description

📋 Description

• Join OpenAI's cutting-edge Future of Computing Research team, an applied research group within the Consumer Devices division, dedicated to pioneering new methods, models, and evaluation frameworks that will define the future of computing.
• Immerse yourself in the frontier of multimodal AI, transforming nascent model capabilities into intuitive, delightful, and trustworthy product experiences for consumers.
• Contribute to the development of a novel class of AI systems designed for continuous learning, individual adaptation, and seamless integration into the flow of daily life.
• Focus on building sophisticated long-term memory, user modeling, and personalization systems that are not only aligned with immediate user satisfaction but also with broader personal goals, values, and overall well-being.
• Collaborate extensively with cross-functional teams including research, engineering, design, product, and safety to establish clear definitions and best practices for building AI systems that understand users over time, act appropriately, and provide demonstrably beneficial assistance in a context-aware and respectful manner.
• This role is specifically focused on advancing Reinforcement Learning from Human Feedback (RLHF) and post-training techniques for personalized, multimodal AI systems.
• You will be instrumental in constructing the foundational learning and evaluation mechanisms that enable AI models to become increasingly context-aware, adaptive, and valuable to users over extended periods.
• Tackle complex challenges in reward modeling, preference learning, long-horizon evaluation, and policy improvement, ensuring systems make high-quality behavioral decisions in authentic user environments.
• Your work will be deeply integrated with product development, with success measured not just by abstract benchmark performance but by tangible improvements in real-world model behavior.
• Embrace the opportunity to move beyond simple one-turn assistant interactions towards AI systems that evolve through feedback, learn from rich interaction signals, and are trained against meaningful definitions of user value.
• This translates internally to a critical need for meticulous reward design, robust feedback loops, and comprehensive evaluation frameworks that rigorously assess the long-term benefits of AI interventions.
• Develop and refine RLHF and post-training methodologies specifically tailored for multimodal AI models, pushing the boundaries of what's possible.
• Construct advanced reward models and preference-learning pipelines that drive adaptive, personalized, and highly responsive model behavior.
• Architect and implement sophisticated datasets, rubrics, and evaluation frameworks designed to accurately capture user preferences, contextual appropriateness, and the enduring value of AI interactions in realistic scenarios.
• Conduct rigorous experiments focused on policy improvement, leveraging explicit user feedback, implicit behavioral signals, and advanced model-based grading techniques.
• Address the intricate challenges of long-horizon evaluation, where the true quality of a model is determined by its ability to improve outcomes over time, rather than just single-turn performance.
• Partner closely with safety researchers to guarantee that AI adaptation and personalization efforts remain ethically aligned, transparent, and strictly bounded by predefined constraints.
• Prototype and rapidly iterate on training recipes, reward formulations, data pipelines, and comprehensive evaluation suites to optimize for product-relevant behaviors.
• Play a key role in defining how OpenAI quantifies success for its personalized AI systems, focusing on critical metrics such as trust, appropriateness, and demonstrable long-term user benefit.
• You will thrive in this role if you possess a strong foundation in machine learning research, with demonstrable experience in RLHF, reward modeling, preference optimization, or post-training techniques for large-scale models.
• Experience in one or more of the following areas is highly desirable: reinforcement learning, ranking algorithms, recommender systems, personalization techniques, memory systems, or human-in-the-loop evaluation methodologies.
• A commitment to rigorous empirical research, coupled with the ability to design clean experiments, reliable evaluations, and decision-making metrics, is essential.
• You should be energized by the complex challenge of training AI models against nuanced and multifaceted behavioral objectives.
• Prior experience in building datasets or developing evaluation pipelines grounded in human preferences, established rubrics, or observed real-world product behavior will be a significant asset.
• Comfort and proficiency in working across the entire technical stack, from data generation and labeling strategies to the implementation of training runs, reward functions, and subsequent analysis, is expected.
• A keen interest in multimodal AI and an understanding of how AI models can learn and evolve through richer, more diverse interaction signals over time are crucial.
• A strong desire to contribute to product-shaping research with exceptionally high stakes for user trust, AI alignment, and long-term user value.
• Enjoyment of close, collaborative teamwork with engineers, designers, and safety researchers to translate groundbreaking research into functional, real-world AI systems.

Skills & Technologies

Hybrid

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

OpenAI, Inc.

Visit Website

About OpenAI, Inc.

OpenAI is a San Francisco-based artificial intelligence research and deployment company founded in 2015. It develops large-scale AI models such as GPT, DALL-E, and Codex, providing cloud APIs and consumer applications like ChatGPT. Originally established as a non-profit, it later created a capped-profit subsidiary to attract capital while maintaining its mission to ensure artificial general intelligence benefits all of humanity.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.