Centific Global Technologies Pte. Ltd. logo

Staff Research Scientist - Reinforcement Learning

Job Overview

Location

Remote Work( USA)

Job Type

Full-time

Category

Machine Learning Engineer

Date Posted

June 14, 2026

Full Job Description

📋 Description

  • Design simulation environments and digital twins to model enterprise workflows for AI agent training and evaluation.
  • Post-train large language models using reinforcement learning methods including RLHF, DPO, GRPO, and PPO to align agent behavior with human preferences and task outcomes.
  • Build end-to-end pipelines that convert human-labeled traces and verifiable signals into structured, high-quality training data for reinforcement learning.
  • Architect multi-turn, tool-using LLM agents with closed-loop learning systems that iteratively improve through feedback and environmental interaction.
  • Design reward functions and verifiers that are robust to reward hacking and accurately reflect real-world task success metrics.
  • Set and uphold the technical bar across the research team through architecture reviews, code standards, and engineering best practices.
  • Mentor junior researchers and engineers, driving technical direction through influence rather than authority.
  • Translate cutting-edge research into production-grade systems deployed in enterprise environments such as healthcare, finance, and compliance.
  • Contribute to peer-reviewed publications and present findings at top-tier AI conferences.
  • Collaborate with industry leaders including NVIDIA and Microsoft to advance the state of the art in generative AI and agent systems.
  • Develop and optimize reinforcement learning systems using Gymnasium-based environments with attention to sparse and dense reward structures.
  • Implement and maintain modern RL post-training and rollout-serving libraries such as TRL, veRL, OpenRLHF, and SkyRL in production contexts.
  • Ensure all AI systems are governed, compliant, and trustworthy for enterprise adoption in safety-critical domains.
  • Lead the development of scalable, distributed training pipelines on GPU clusters to support large-scale RL experiments.
  • Apply advanced RL techniques including MDPs, policy gradient methods (PPO, SAC), and temporal difference learning to solve complex enterprise problems.
  • Engineer LLM agents capable of tool use, multi-turn reasoning, and trajectory evaluation with high reliability and interpretability.
  • Maintain strong software engineering discipline: build production pipelines, not just research notebooks, using Python and modern ML tooling.

Skills & Technologies

Python
Senior
Remote
$200k-250k
Degree Required

Ready to Apply?

You will be redirected to an external site to apply.

AI Job Fit Analysis
Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

Centific Global Technologies Pte. Ltd. logo
Centific Global Technologies Pte. Ltd.
Visit Website

About Centific Global Technologies Pte. Ltd.

Centific is a data-centric AI services company providing data collection, annotation, and model validation solutions to enterprises and technology vendors. It operates a global crowd platform that combines human intelligence with automation to prepare, curate, and test datasets for computer vision, NLP, and generative AI applications. The company supports full AI lifecycle needs, from training data to reinforcement learning and model safety, serving industries including retail, automotive, healthcare, and technology. Headquartered in Singapore, Centific maintains delivery centers across Asia, Europe, and North America.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

Expires soon
ARGENTINA
Full-time
Expires Jun 20, 2026 (Soon)
AWS
Terraform
TensorFlow
+4 more

2 months ago

Argentina
Full-time
Expires Jul 20, 2026
Remote

27 days ago

Expires soon
Qualysoft GmbH logo

Qualysoft GmbH

Bucharest
Full-time
Expires Jun 22, 2026 (Soon)
Data Science
Senior
Onsite

2 months ago

Expired
Melbourne
Full-time
Expired May 15, 2026
Python
Kubernetes
PyTorch
+4 more

3 months ago