Anthropic, PBC logo

Anthropic Fellows Program — Reinforcement Learning

Job Overview

Location

London, UK; Ontario, CAN; Remote-Friendly, United States; San Francisco, CA

Job Type

Full-time

Category

Data Science

Date Posted

May 6, 2026

Full Job Description

📋 Description

  • The Anthropic Fellows Program — Reinforcement Learning is a 4-month full-time research fellowship designed to foster AI research and engineering talent, with a focus on empirical projects aligned with Anthropic’s research priorities in reinforcement learning, aiming to produce public outputs such as paper submissions.
  • Fellows will work on projects such as building model-based tools to understand AI training data, creating RL environments to improve Claude models, conducting research on RL algorithms, and building RL environments for safety-related tasks, under direct mentorship from Anthropic researchers including Ruhua Jiang, Kaidi Cao, Sunny Duan, and others.
  • The program is part of Anthropic’s broader mission to create reliable, interpretable, and steerable AI systems, bringing together researchers, engineers, policy experts, and business leaders to build beneficial AI systems, with fellows gaining access to shared workspaces in Berkeley or London, or remote options in the UK, US, or Canada.
  • Fellows will receive a weekly stipend of $3,850 USD / £2,310 GBP / CAD 4,300, funding for compute (~$15k/month), and other research expenses, while developing skills in empirical AI research, collaboration across disciplines, and large-scale distributed systems, with strong performance potentially leading to full-time offers at Anthropic or other AI safety organizations.

🎯 Requirements

  • Fluent in Python programming
  • Available to work full-time on the Fellows program
  • Have work authorization in the US, UK, or Canada and be located in that country during the program
  • Strong technical background in computer science, mathematics, or physics
  • Experience with training, fine-tuning, or evaluating large language models

🏖️ Benefits

  • Weekly stipend of 3,850 USD / 2,310 GBP / 4,300 CAD + benefits (varies by country)
  • Funding for compute (~$15k/month) and other research expenses
  • Access to shared workspace in Berkeley, California or London, UK (remote-friendly options available)
  • Direct mentorship from Anthropic researchers
  • Connection to the broader AI safety and security research community
  • Opportunity to produce public research output (e.g., paper submission)

Skills & Technologies

Python
Go
Remote
Degree Required

Ready to Apply?

You will be redirected to an external site to apply.

Anthropic, PBC logo
Anthropic, PBC
Visit Website

About Anthropic, PBC

Anthropic is a public benefit corporation founded in 2021 by former OpenAI researchers to develop large-scale AI systems that are safe, interpretable and aligned with human values. The company produces Claude, a family of conversational and reasoning models based on constitutional AI and reinforcement learning from human feedback. Headquartered in San Francisco, Anthropic combines frontier research with applied engineering, publishing scholarly papers on alignment, interpretability and robustness while offering API access and commercial products built on its models.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

❌ EXPIRED
Argentina - Remote
Full-time
Expired Apr 25, 2026
JavaScript
TypeScript
React
+3 more

2 months ago

Apply
⏰ EXPIRES SOON
Onhires Inc. logo

Onhires Inc.

Latin America
Full-time
Expires May 11, 2026 (Soon)
Remote

2 months ago

Apply
❌ EXPIRED
Argentina - Buenos Aires
Contract
Expired Apr 28, 2026
Python
AWS
Azure
+4 more

2 months ago

Apply
Argentina - Fully Remote
Contract
Expires Jun 21, 2026
Python
Remote
Degree Required

16 days ago

Apply