This job has expired

This position was posted on March 27, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Research Intern — Applied Reinforcement Learning

Centific Global Technologies Pte. Ltd.

Job Overview

Location

Remote Work( USA)

Job Type

Full-time

Full Job Description

📋 Description

• Centific Global Technologies Pte. Ltd. is seeking a PhD Research Intern in Applied Reinforcement Learning to contribute to cutting-edge agentic AI systems that bridge research and real-world enterprise deployment. This role is critical to advancing Centific’s mission of enabling safe, scalable GenAI adoption through innovative RL-driven alignment and training pipelines for large language models.
• The intern will design and evaluate end-to-end reinforcement learning systems for LLM-based agents, translating theoretical research into practical tools that improve reasoning, safety, and task performance in enterprise workflows—directly supporting Centific’s zero-distance innovation™ solutions that reduce GenAI costs by up to 80% and accelerate time-to-market by 50%.
• Day-to-day responsibilities include developing RL environments simulating real-world enterprise processes (e.g., digital twins), designing reward functions and verifiers using RLHF, DPO, and PPO methodologies, and building scalable training and inference pipelines for agentic systems using PyTorch and GPU-accelerated infrastructure.
• The intern will create evaluation frameworks to measure agent reasoning, task success, and policy safety; prototype agentic systems with tool use and multi-step reasoning; and document experiments, ablations, and findings to support both research publication and productionization of models.
• Collaboration with Centific’s team of over 150 PhDs, 4,000 AI practitioners, and 1.8 million vertical domain experts across 230 markets will provide exposure to industry-leading partnerships and multilingual, contextual datasets used to fine-tune industry-specific LLMs and RAG pipelines.
• The role offers hands-on experience with modern RL libraries (Gymnasium, RLlib, Stable Baselines), LLM post-training tools (TRL, custom RLHF pipelines), experiment tracking (Weights & Biases), and deployment technologies (FastAPI, gRPC, ONNX, TensorRT) in a high-impact research environment.
• Centific’s AI Research division fosters a culture of innovation and inclusivity, offering mentorship from senior researchers and engineers, access to state-of-the-art GPU infrastructure, and opportunities to publish findings at top-tier ML conferences such as NeurIPS, ICML, ICLR, or ACL.
• This internship is ideal for a PhD candidate seeking to apply deep RL expertise to agentic AI challenges in enterprise settings, with the potential to influence scalable AI solutions used by Fortune 500 clients and the Magnificent Seven.

🎯 Requirements

• PhD candidate in Computer Science, Machine Learning, or a related field with active research in reinforcement learning or agentic AI.
• Strong proficiency in Python and PyTorch, including hands-on experience with GPU-based training and distributed computing.
• Solid understanding of core RL fundamentals such as MDPs, policy gradients, value iteration, and advanced algorithms like PPO, DPO, and RLHF.
• Familiarity with LLMs and post-training techniques, including reward modeling from human feedback and alignment strategies.
• Demonstrated ability to conduct rigorous experimentation with emphasis on ablation studies, reproducibility, and clear technical reporting.

🏖️ Benefits

• Competitive hourly stipend ranging from $35 to $45, reflecting the value of advanced research contributions.
• Mentorship from experienced AI researchers and engineers at Centific, supporting professional and technical growth.
• Access to modern GPU infrastructure for large-scale experimentation and model training.
• Opportunities to publish research findings and present at leading machine learning conferences.
• Flexible work arrangement with remote eligibility, plus preferred locations in Palo Alto, CA and Redmond, WA.
• Equal Opportunity Employer commitment to diversity, inclusion, and fair consideration for all qualified applicants regardless of background.

Skills & Technologies

Python

FastAPI

gRPC

PyTorch

Junior

Remote

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Centific Global Technologies Pte. Ltd.

Visit Website

About Centific Global Technologies Pte. Ltd.

Centific is a data-centric AI services company providing data collection, annotation, and model validation solutions to enterprises and technology vendors. It operates a global crowd platform that combines human intelligence with automation to prepare, curate, and test datasets for computer vision, NLP, and generative AI applications. The company supports full AI lifecycle needs, from training data to reinforcement learning and model safety, serving industries including retail, automotive, healthcare, and technology. Headquartered in Singapore, Centific maintains delivery centers across Asia, Europe, and North America.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.