G2i Inc. logo

Machine Learning Evaluation Specialist (Remote)

Job Overview

Location

Remote

Job Type

Contract

Category

Data Science

Date Posted

April 3, 2026

Full Job Description

đź“‹ Description

  • • As a Machine Learning Evaluation Specialist at G2i Inc., you will play a pivotal role in advancing the frontiers of artificial intelligence by designing evaluation tasks that expose the limitations of current AI systems in highly specialized scientific and technical domains. This is not an engineering or model-building role; instead, you will leverage your deep domain expertise to craft research-grade problems that challenge even the most advanced AI, ensuring that evaluation benchmarks reflect real-world complexity and nuance.
  • • Your day-to-day responsibilities will include proposing and framing original, research-level machine learning problems grounded in your area of expertise; designing evaluation tasks that require specialized knowledge far beyond standard ML pipelines; critically assessing AI-generated solutions for correctness, creativity, and methodological rigor, with detailed explanations of where and why they fail; documenting problem difficulty, required domain knowledge, and expected failure modes to inform future benchmark development; collaborating asynchronously with a global team of experts to refine and validate evaluation frameworks; and contributing to the creation of benchmarks that push the boundaries of what AI can understand and reason about in complex domains.
  • • You will join a remote-first, mission-driven team at G2i Inc., a company dedicated to connecting top technical talent with impactful opportunities while advancing the state of AI through rigorous, domain-specific evaluation. The team values intellectual curiosity, independence, and deep expertise, fostering an environment where specialists can focus on meaningful research without the constraints of traditional engineering workflows.
  • • In this role, you will sharpen your ability to translate complex domain knowledge into precise, evaluable challenges for AI systems; gain experience in shaping the future of AI benchmarking; contribute to publicly relevant research that highlights both the strengths and shortcomings of modern machine learning; and establish yourself as a key contributor to the growing field of AI evaluation and AI safety through high-impact, intellectually demanding work.

🎯 Requirements

  • • Graduate-level expertise (MS or PhD preferred) in a scientific or technical domain that intersects with machine learning, such as computational biology, genomics, physics, climate modeling, healthcare, neuroscience, materials science, finance, robotics, advanced NLP, or applied mathematics/statistics.
  • • Strong working knowledge of core machine learning concepts including model selection, feature engineering, and evaluation metrics, sufficient to understand how ML methods are applied and where they may fall short in complex domains.
  • • Deep familiarity with active, cutting-edge research problems in your field, enabling you to identify where general ML knowledge fails and where specialized insight is required.
  • • Excellent written communication skills, with the ability to articulate highly complex, technical problems with clarity, precision, and rigor — this is essential for crafting evaluation tasks that are both challenging and unambiguous.
  • • Self-motivation and comfort working independently on intellectually demanding, open-ended tasks that require sustained focus and critical thinking.

🏖️ Benefits

  • • Fully remote work arrangement — collaborate from anywhere in the world, with flexibility to design your own schedule within the 10–40 hours per week range.
  • • Competitive hourly compensation ranging from $200 to $400 per hour, based on domain expertise and seniority, reflecting the high value placed on specialized knowledge.
  • • Opportunity to engage in meaningful, research-driven work that directly contributes to advancing AI evaluation and understanding AI limitations in real-world scientific contexts.
  • • Paid assessment process — if selected to proceed, you will be compensated for the time spent on the required evaluation task.
  • • Freedom to pursue other professional engagements simultaneously, as this is a project-based, freelance role with no guaranteed hours, allowing for portfolio-style work.

Skills & Technologies

Remote
$200-400/hr
Degree Required

Ready to Apply?

You will be redirected to an external site to apply.

About G2i Inc.

G2i is a technical talent marketplace that pre-vets React, React Native, and Node.js engineers for U.S. companies. Founded by developers to solve hiring pain, it runs extensive code reviews, pair-programming interviews, and background checks before matching engineers for contract or full-time remote roles. G2i emphasizes mental health, offering a monthly wellness stipend and a zero-burnout policy. The company also provides direct-hire services and manages payroll, compliance, and ongoing support, enabling startups and enterprises to scale engineering teams quickly while maintaining code quality and developer well-being.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

❌ EXPIRED
Argentina - Remote
Full-time
Expired Apr 25, 2026
JavaScript
TypeScript
React
+3 more

3 months ago

Apply
❌ EXPIRED
Onhires Inc. logo

Onhires Inc.

Latin America
Full-time
Expired May 11, 2026
Remote

2 months ago

Apply
❌ EXPIRED
Argentina - Buenos Aires
Contract
Expired Apr 28, 2026
Python
AWS
Azure
+4 more

2 months ago

Apply
Argentina - Fully Remote
Contract
Expires Jun 21, 2026
Python
Remote
Degree Required

20 days ago

Apply