Centific Global Technologies Pte. Ltd. logo

Applied Reinforcement Learning Engineer

Job Overview

Location

Remote Work( USA)

Job Type

Full-time

Category

Machine Learning Engineer

Date Posted

March 27, 2026

Full Job Description

đź“‹ Description

  • • As an Applied Reinforcement Learning Engineer at Centific Global Technologies Pte. Ltd., you will design and build custom reinforcement learning environments that simulate complex enterprise workflows such as document processing, compliance, onboarding, and support automation, enabling safe and scalable AI agent training for high-stakes operations.
  • • You will post-train large language model (LLM)-based agents using advanced techniques including PPO, GRPO, DPO, and RLHF on domain-specific tasks, translating human-labeled traces into structured RL training data to improve agent performance in real-world enterprise settings.
  • • You will architect multi-step reasoning agents with tool-calling capabilities and closed learning loops, design reward functions and verifiers for outcome validation, and construct end-to-end pipelines that bridge human feedback with automated RL training for pre-deployment testing and continuous improvement.
  • • You will translate cutting-edge RL research into production-grade systems, contribute to internal innovation and potential publications, and collaborate with a team of over 150 PhDs and 4,000 AI practitioners to advance governed, compliant AI solutions for Fortune 500 and enterprise clients.
  • • You will work within Centific’s AI Research team, which focuses on reinforcement learning, alignment, and human-centered intelligence to transform data and signals into next-generation intelligent systems, leveraging partnerships with NVIDIA, Microsoft, and a global network of 1.8 million vertical domain experts across 230 markets.
  • • You will have the opportunity to shape a new discipline at the intersection of RL, simulation, and enterprise AI, seeing your research deployed in healthcare, finance, logistics, and safety-critical applications while reducing GenAI costs by up to 80% and accelerating time-to-market by 50% through zero-distance innovation™ solutions.
  • • You will grow your expertise in offline RL (CQL, BCQ, IQL), model-based RL (World models, Dreamer, MuZero), hierarchical RL, imitation learning, and exploration strategies, applying these techniques to build trustworthy, scalable AI agent workflows that meet enterprise compliance and safety standards.

🎯 Requirements

  • • Deep RL expertise: 3+ years hands-on experience with environment design, reward engineering, and policy optimization using classical and modern RL methods including MDPs, Q-learning, policy gradients, PPO, TRPO, SAC, and TD learning.
  • • LLM post-training experience: Proven ability to fine-tune large language models using RLHF, DPO, PPO, or similar preference optimization techniques, with understanding of reward modeling, preference learning, and human feedback integration.
  • • Production software engineering skills: Strong proficiency in Python and experience building scalable pipelines and training infrastructure using Gymnasium, RLlib, Stable Baselines, PyTorch, JAX, or TensorFlow, with emphasis on software engineering beyond research prototypes.
  • • Agentic AI experience: Hands-on work with LLM-based agents, tool use, multi-step reasoning, and closed-loop learning systems that integrate external tools and APIs for autonomous task execution.
  • • Advanced degree or equivalent: MS or PhD in Computer Science, Machine Learning, or a related technical field, or equivalent professional experience demonstrating mastery of RL and ML concepts.

🏖️ Benefits

  • • Competitive salary range of $150K–$160K annually, reflecting the specialized expertise required for applied RL engineering in enterprise AI.
  • • Opportunity to collaborate with industry leaders including NVIDIA and Microsoft, and contribute to real-world AI systems deployed across healthcare, finance, logistics, and safety-critical enterprise environments.
  • • Access to a cutting-edge research environment with over 150 PhDs and 4,000 AI practitioners, enabling continuous learning and innovation at the frontier of RL, simulation, and LLM alignment.
  • • Ability to publish and share research, shape product direction, and see scientific contributions translated into production systems that power scalable, trustworthy AI for global enterprises.
  • • Remote/hybrid work flexibility with options to work from Palo Alto, CA, Seattle, WA, or fully remote, supporting work-life balance while engaging with a distributed, world-class AI team.
  • • Exposure to advanced techniques including world models, synthetic data generation, distributed training, and open-source RL frameworks (CleanRL, TRL, veRL), fostering technical growth and professional impact.

Skills & Technologies

Python
TensorFlow
PyTorch
Remote
$150k-160k
Degree Required

Ready to Apply?

You will be redirected to an external site to apply.

Centific Global Technologies Pte. Ltd. logo
Centific Global Technologies Pte. Ltd.
Visit Website

About Centific Global Technologies Pte. Ltd.

Centific is a data-centric AI services company providing data collection, annotation, and model validation solutions to enterprises and technology vendors. It operates a global crowd platform that combines human intelligence with automation to prepare, curate, and test datasets for computer vision, NLP, and generative AI applications. The company supports full AI lifecycle needs, from training data to reinforcement learning and model safety, serving industries including retail, automotive, healthcare, and technology. Headquartered in Singapore, Centific maintains delivery centers across Asia, Europe, and North America.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

ARGENTINA
Full-time
Expires Jun 20, 2026
AWS
Terraform
TensorFlow
+4 more

3 days ago

Apply
Qualysoft GmbH logo

Qualysoft GmbH

Bucharest
Full-time
Expires Jun 22, 2026
Data Science
Senior
Onsite

1 day ago

Apply
Melbourne
Full-time
Expires May 15, 2026
Python
Kubernetes
PyTorch
+4 more

1 month ago

Apply
Heidi Health Pty Ltd logo

Heidi Health Pty Ltd

Melbourne
Full-time
Expires May 15, 2026
Python
Go
TensorFlow
+4 more

1 month ago

Apply