Anthropic Fellows Program — Reinforcement Learning

Anthropic, PBC

Job Overview

Location

London, UK; Ontario, CAN; Remote-Friendly, United States; San Francisco, CA

Job Type

Full-time

Full Job Description

📋 Description

• The Anthropic Fellows Program — Reinforcement Learning is a 4-month full-time research fellowship designed to foster AI research and engineering talent, with a focus on empirical projects aligned with Anthropic’s research priorities in reinforcement learning, aiming to produce public outputs such as paper submissions.
• Fellows will work on projects such as building model-based tools to understand AI training data, creating RL environments to improve Claude models, conducting research on RL algorithms, and building RL environments for safety-related tasks, under direct mentorship from Anthropic researchers including Ruhua Jiang, Kaidi Cao, Sunny Duan, and others.
• The program is part of Anthropic’s broader mission to create reliable, interpretable, and steerable AI systems, bringing together researchers, engineers, policy experts, and business leaders to build beneficial AI systems, with fellows gaining access to shared workspaces in Berkeley or London, or remote options in the UK, US, or Canada.
• Fellows will receive a weekly stipend of $3,850 USD / £2,310 GBP / CAD 4,300, funding for compute (~$15k/month), and other research expenses, while developing skills in empirical AI research, collaboration across disciplines, and large-scale distributed systems, with strong performance potentially leading to full-time offers at Anthropic or other AI safety organizations.

🎯 Requirements

• Fluent in Python programming
• Available to work full-time on the Fellows program
• Have work authorization in the US, UK, or Canada and be located in that country during the program
• Strong technical background in computer science, mathematics, or physics
• Experience with training, fine-tuning, or evaluating large language models

🏖️ Benefits

• Weekly stipend of 3,850 USD / 2,310 GBP / 4,300 CAD + benefits (varies by country)
• Funding for compute (~$15k/month) and other research expenses
• Access to shared workspace in Berkeley, California or London, UK (remote-friendly options available)
• Direct mentorship from Anthropic researchers
• Connection to the broader AI safety and security research community
• Opportunity to produce public research output (e.g., paper submission)

Skills & Technologies

Python

Remote

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Anthropic, PBC

Visit Website

About Anthropic, PBC

Anthropic is a public benefit corporation founded in 2021 by former OpenAI researchers to develop large-scale AI systems that are safe, interpretable and aligned with human values. The company produces Claude, a family of conversational and reasoning models based on constitutional AI and reinforcement learning from human feedback. Headquartered in San Francisco, Anthropic combines frontier research with applied engineering, publishing scholarly papers on alignment, interpretability and robustness while offering API access and commercial products built on its models.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.