This job has expired

This position was posted on March 27, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Research Intern — Applied Reinforcement Learning

Centific Global Technologies Pte. Ltd.

Job Overview

Location

Remote Work( USA)

Job Type

Full-time

Full Job Description

📋 Description

• This PhD Research Intern role in Applied Reinforcement Learning at Centific Global Technologies Pte. Ltd. is a high-impact opportunity to advance agentic AI systems by designing and evaluating RL pipelines that translate cutting-edge research into scalable enterprise solutions for GenAI deployment.
• You will work at the frontier of AI innovation, contributing to Centific’s mission of bridging AI creators and industry leaders through safe, scalable, and cost-efficient GenAI technologies that reduce deployment costs by up to 80% and accelerate time-to-market by 50%.
• Day-to-day responsibilities include designing end-to-end reinforcement learning pipelines for agentic systems, spanning simulation, training, and evaluation phases, with a focus on aligning LLM-based agents using RLHF, DPO, PPO, and emerging alignment techniques.
• You will develop reward models, verifiers, and evaluation frameworks to measure reasoning, task success, and policy safety in AI agents, ensuring robust and trustworthy behavior in enterprise workflows.
• A core part of your role involves building simulation environments (digital twins) that mirror real-world enterprise processes, enabling safe and scalable training of RL-based agents for complex, multi-step decision-making tasks.
• You will implement scalable training and inference pipelines using PyTorch and GPU infrastructure, leveraging tools like RLlib, Stable Baselines, and TRL to optimize performance and reproducibility.
• Example projects include constructing custom RL environments simulating enterprise workflows, training agents via PPO or GRPO, developing reward modeling pipelines from human feedback, and prototyping agentic systems with tool use and multi-step reasoning integrated with RL training.
• You will document experiments, ablations, and findings rigorously to support both research publication and productionization of AI systems.
• Centific’s team comprises over 150 PhDs and data scientists, alongside 4,000+ AI practitioners and engineers, backed by a global network of 1.8 million vertical domain experts across 230 markets, enabling deep contextual and multilingual AI solutions.
• The company’s zero-distance innovation™ approach integrates industry-leading partnerships and proprietary technology platforms to deliver pre-trained datasets, fine-tuned LLMs, and RAG pipelines powered by vector databases.
• This role offers direct mentorship from senior researchers and engineers, access to modern GPU infrastructure, and opportunities to publish and present findings at top-tier ML conferences such as NeurIPS, ICML, ICLR, or ACL.
• You will gain hands-on experience in translating theoretical RL research into practical, enterprise-grade AI systems, building expertise in agentic AI, reward modeling, and scalable RL pipelines.
• The internship lasts 3–6 months and is available remotely (USA), with preferred locations in Palo Alto, CA, and Redmond, WA, offering flexibility while maintaining collaboration with a world-class AI research team.
• By joining Centific, you contribute to reducing GenAI costs and accelerating deployment for Fortune 500 and enterprise clients, helping them maintain a competitive edge through responsible, scalable AI innovation.

🎯 Requirements

• PhD candidate in Computer Science, Machine Learning, or a related field with active research focus in reinforcement learning or agentic AI.
• Strong proficiency in Python and PyTorch, including hands-on experience with GPU-based training and deep learning workflows.
• Solid understanding of core reinforcement learning fundamentals, including MDPs, policy gradients, value-based methods, and exploration-exploitation trade-offs.
• Practical experience with LLMs and post-training techniques such as RLHF, DPO, PPO, or related alignment methodologies.
• Demonstrated commitment to rigorous experimentation practices, including ablation studies, reproducibility, and clear scientific reporting.

🏖️ Benefits

• Competitive hourly stipend ranging from $35 to $45, reflecting the value of your research contributions.
• Mentorship from leading researchers and engineers in Centific’s AI Research team, fostering professional and technical growth.
• Access to state-of-the-art GPU infrastructure for large-scale experimentation and model training.
• Opportunities to publish research findings and present at top-tier machine learning conferences.
• Real-world impact through projects that directly influence enterprise GenAI deployment strategies and cost-efficiency initiatives.
• Flexible remote work arrangement (USA-based) with optional preferred hubs in Palo Alto, CA, and Redmond, WA.

Skills & Technologies

Python

FastAPI

gRPC

PyTorch

Junior

Remote

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Centific Global Technologies Pte. Ltd.

Visit Website

About Centific Global Technologies Pte. Ltd.

Centific is a data-centric AI services company providing data collection, annotation, and model validation solutions to enterprises and technology vendors. It operates a global crowd platform that combines human intelligence with automation to prepare, curate, and test datasets for computer vision, NLP, and generative AI applications. The company supports full AI lifecycle needs, from training data to reinforcement learning and model safety, serving industries including retail, automotive, healthcare, and technology. Headquartered in Singapore, Centific maintains delivery centers across Asia, Europe, and North America.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.