This job has expired

This position was posted on April 3, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Machine Learning Evaluation Specialist (Remote)

G2i Inc.

Job Overview

Location

Remote

Job Type

Contract

Full Job Description

📋 Description

• As a Machine Learning Evaluation Specialist at G2i Inc., you will play a pivotal role in advancing the frontiers of artificial intelligence by designing evaluation tasks that expose the limitations of current AI systems in highly specialized scientific and technical domains. This is not an engineering or model-building role; instead, you will leverage your deep domain expertise to craft research-grade problems that challenge even the most advanced AI, ensuring that evaluation benchmarks reflect real-world complexity and nuance.
• Your day-to-day responsibilities will include proposing and framing original, research-level machine learning problems grounded in your area of expertise; designing evaluation tasks that require specialized knowledge far beyond standard ML pipelines; critically assessing AI-generated solutions for correctness, creativity, and methodological rigor, with detailed explanations of where and why they fail; documenting problem difficulty, required domain knowledge, and expected failure modes to inform future benchmark development; collaborating asynchronously with a global team of experts to refine and validate evaluation frameworks; and contributing to the creation of benchmarks that push the boundaries of what AI can understand and reason about in complex domains.
• You will join a remote-first, mission-driven team at G2i Inc., a company dedicated to connecting top technical talent with impactful opportunities while advancing the state of AI through rigorous, domain-specific evaluation. The team values intellectual curiosity, independence, and deep expertise, fostering an environment where specialists can focus on meaningful research without the constraints of traditional engineering workflows.
• In this role, you will sharpen your ability to translate complex domain knowledge into precise, evaluable challenges for AI systems; gain experience in shaping the future of AI benchmarking; contribute to publicly relevant research that highlights both the strengths and shortcomings of modern machine learning; and establish yourself as a key contributor to the growing field of AI evaluation and AI safety through high-impact, intellectually demanding work.

🎯 Requirements

• Graduate-level expertise (MS or PhD preferred) in a scientific or technical domain that intersects with machine learning, such as computational biology, genomics, physics, climate modeling, healthcare, neuroscience, materials science, finance, robotics, advanced NLP, or applied mathematics/statistics.
• Strong working knowledge of core machine learning concepts including model selection, feature engineering, and evaluation metrics, sufficient to understand how ML methods are applied and where they may fall short in complex domains.
• Deep familiarity with active, cutting-edge research problems in your field, enabling you to identify where general ML knowledge fails and where specialized insight is required.
• Excellent written communication skills, with the ability to articulate highly complex, technical problems with clarity, precision, and rigor — this is essential for crafting evaluation tasks that are both challenging and unambiguous.
• Self-motivation and comfort working independently on intellectually demanding, open-ended tasks that require sustained focus and critical thinking.

🏖️ Benefits

• Fully remote work arrangement — collaborate from anywhere in the world, with flexibility to design your own schedule within the 10–40 hours per week range.
• Competitive hourly compensation ranging from $200 to $400 per hour, based on domain expertise and seniority, reflecting the high value placed on specialized knowledge.
• Opportunity to engage in meaningful, research-driven work that directly contributes to advancing AI evaluation and understanding AI limitations in real-world scientific contexts.
• Paid assessment process — if selected to proceed, you will be compensated for the time spent on the required evaluation task.
• Freedom to pursue other professional engagements simultaneously, as this is a project-based, freelance role with no guaranteed hours, allowing for portfolio-style work.

Skills & Technologies

Remote

$200-400/hr

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

G2i Inc.

Visit Website

About G2i Inc.

G2i is a technical talent marketplace that pre-vets React, React Native, and Node.js engineers for U.S. companies. Founded by developers to solve hiring pain, it runs extensive code reviews, pair-programming interviews, and background checks before matching engineers for contract or full-time remote roles. G2i emphasizes mental health, offering a monthly wellness stipend and a zero-burnout policy. The company also provides direct-hire services and manages payroll, compliance, and ongoing support, enabling startups and enterprises to scale engineering teams quickly while maintaining code quality and developer well-being.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.