
Job Overview
Location
Miami
Job Type
Contract
Category
Software Engineer
Date Posted
May 16, 2026
Full Job Description
📋 Description
- • Design and build coding benchmarks that evaluate frontier AI models on real-world software engineering tasks, including reasoning, debugging, and production-quality code generation
- • Develop and maintain scalable data pipelines to support automated evaluation workflows for AI-generated code
- • Analyze model-generated code for correctness, reliability, edge-case failures, and adherence to software engineering best practices
- • Construct structured evaluation scenarios that span large, multi-repository codebases and multi-language programming environments
- • Provide detailed technical feedback on AI model performance, identifying patterns of success and failure to inform iterative benchmark improvements
- • Contribute to the development of evaluation frameworks that define industry standards for measuring coding ability in AI systems
- • Ensure benchmarks effectively distinguish between high-performing and weak AI models by creating tasks grounded in real software engineering work
- • Implement evaluation harnesses that automate task execution, result collection, and failure analysis across diverse coding challenges
- • Collaborate with engineering and research teams to refine evaluation methodologies based on empirical model behavior and performance trends
- • Maintain version-controlled, well-documented, and tested code for all benchmark components and evaluation infrastructure
- • Optimize evaluation pipelines for speed, reproducibility, and resource efficiency while handling large volumes of model outputs
- • Work within modern development workflows using Git, code reviews, and automated testing to ensure high-quality, production-grade evaluation systems
- • Translate abstract model capabilities into concrete, measurable evaluation criteria that align with real-world software development needs
- • Iterate on benchmark design based on feedback from model performance data to continuously improve discriminative power and relevance
- • Document evaluation protocols, scoring rubrics, and failure analysis methodologies for internal and external consumption
- • Support the creation of datasets used to train and evaluate next-generation AI coding models through rigorous, repeatable testing procedures
- • Operate independently to manage end-to-end evaluation pipeline development, from initial design to deployment and ongoing maintenance
🎯 Requirements
- • 4+ years of professional software engineering experience (non-negotiable)
- • Expert Python — clean, performant, well-tested code
- • Hands-on experience working in large, complex codebases
- • Proven experience designing and implementing LLM coding benchmarks and evaluation data pipelines
- • Strong command of Git and modern development workflows
- • Track record at a high-growth tech company or top-tier software organization
🏖️ Benefits
- • $80–$100/hr compensation based on location and seniority
- • Fully remote work — eligible from accepted countries only
- • Weekly payment via PayPal or Stripe
- • 3-month contract with potential for extension
- • Full-time availability preferred, though hours vary week to week
- • Independent contractor (1099) engagement with no visa sponsorship or W-2 employment
Skills & Technologies
See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.
About G2i Inc.
G2i is a technical talent marketplace that pre-vets React, React Native, and Node.js engineers for U.S. companies. Founded by developers to solve hiring pain, it runs extensive code reviews, pair-programming interviews, and background checks before matching engineers for contract or full-time remote roles. G2i emphasizes mental health, offering a monthly wellness stipend and a zero-burnout policy. The company also provides direct-hire services and manages payroll, compliance, and ongoing support, enabling startups and enterprises to scale engineering teams quickly while maintaining code quality and developer well-being.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities

Web.com Group, Inc.
4 months ago

Ryzlabs Inc.
4 months ago

Anyone AI Inc.
3 months ago

Anyone AI Inc.
3 months ago
