Staff Research Scientist - Reinforcement Learning

Centific Global Technologies Pte. Ltd.

Job Overview

Location

Remote Work( USA)

Job Type

Full-time

Full Job Description

📋 Description

• Design simulation environments and digital twins to model enterprise workflows for AI agent training and evaluation.
• Post-train large language models using reinforcement learning methods including RLHF, DPO, GRPO, and PPO to align agent behavior with human preferences and task outcomes.
• Build end-to-end pipelines that convert human-labeled traces and verifiable signals into structured, high-quality training data for reinforcement learning.
• Architect multi-turn, tool-using LLM agents with closed-loop learning systems that iteratively improve through feedback and environmental interaction.
• Design reward functions and verifiers that are robust to reward hacking and accurately reflect real-world task success metrics.
• Set and uphold the technical bar across the research team through architecture reviews, code standards, and engineering best practices.
• Mentor junior researchers and engineers, driving technical direction through influence rather than authority.
• Translate cutting-edge research into production-grade systems deployed in enterprise environments such as healthcare, finance, and compliance.
• Contribute to peer-reviewed publications and present findings at top-tier AI conferences.
• Collaborate with industry leaders including NVIDIA and Microsoft to advance the state of the art in generative AI and agent systems.
• Develop and optimize reinforcement learning systems using Gymnasium-based environments with attention to sparse and dense reward structures.
• Implement and maintain modern RL post-training and rollout-serving libraries such as TRL, veRL, OpenRLHF, and SkyRL in production contexts.
• Ensure all AI systems are governed, compliant, and trustworthy for enterprise adoption in safety-critical domains.
• Lead the development of scalable, distributed training pipelines on GPU clusters to support large-scale RL experiments.
• Apply advanced RL techniques including MDPs, policy gradient methods (PPO, SAC), and temporal difference learning to solve complex enterprise problems.
• Engineer LLM agents capable of tool use, multi-turn reasoning, and trajectory evaluation with high reliability and interpretability.
• Maintain strong software engineering discipline: build production pipelines, not just research notebooks, using Python and modern ML tooling.

Skills & Technologies

Python

Senior

Remote

$200k-250k

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

Centific Global Technologies Pte. Ltd.

Visit Website

About Centific Global Technologies Pte. Ltd.

Centific is a data-centric AI services company providing data collection, annotation, and model validation solutions to enterprises and technology vendors. It operates a global crowd platform that combines human intelligence with automation to prepare, curate, and test datasets for computer vision, NLP, and generative AI applications. The company supports full AI lifecycle needs, from training data to reinforcement learning and model safety, serving industries including retail, automotive, healthcare, and technology. Headquartered in Singapore, Centific maintains delivery centers across Asia, Europe, and North America.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.