This job has expired

This position was posted on February 12, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Applied AI Researcher, Benchmarking

Distyl Inc.

Job Overview

Location

Remote

Job Type

Full-time

Full Job Description

📋 Description

• Distyl AI is at the forefront of developing production-grade AI systems that are revolutionizing core operational workflows for Fortune 500 companies. Through a strategic partnership with OpenAI, proprietary software accelerators, and deep enterprise AI expertise, we are committed to delivering tangible AI solutions with rapid time-to-value, often within a single quarter. Our innovative products have already made a significant impact across diverse industries, including insurance, consumer packaged goods (CPG), and non-profit sectors, empowering these organizations to identify, build, and realize substantial value from their Generative AI investments, frequently for the first time.
• As an Applied AI Researcher focused on Benchmarking, you will play a pivotal role in defining and advancing how progress in AI is measured and understood within the enterprise context. This is a unique opportunity to move beyond incremental improvements on existing benchmarks or simple process optimizations. Instead, you will be challenged to creatively redefine how intelligent systems are evaluated and how their real-world impact is quantified.
• Your primary responsibility will be to design, develop, and implement sophisticated evaluation frameworks. These frameworks will go beyond traditional metrics to capture critical aspects of AI performance, such as reasoning depth, the quality of human-AI interaction, system reliability under various conditions, and ultimately, the operational and business impact these systems deliver. You will be instrumental in constructing benchmarks that accurately reflect the complexities and nuances of real-world enterprise scenarios, ensuring that the evaluations are not just theoretical but practically relevant.
• The benchmarks and evaluation systems you create will set the standard by which new AI architectures, cutting-edge techniques, and model releases are judged. Your work will directly influence both Distyl's internal research priorities and contribute to shaping industry-wide standards for AI evaluation. This involves exploring novel paradigms for assessing intelligent systems, including advanced techniques like adversarial robustness testing to probe system vulnerabilities, longitudinal performance tracking to understand long-term behavior and degradation, and human-in-the-loop assessment methodologies to gauge user experience and effectiveness.
• A key aspect of this role involves investigating the intricate relationship between evaluation metrics and model behavior. You will delve into how the metrics we choose can inadvertently shape or bias the emergent capabilities of AI models, establishing rigorous, data-driven methodologies to quantify these emergent properties. Your insights will be crucial in guiding the development of more robust, reliable, and impactful AI solutions for our clients.
• We are seeking researchers who operate in an AI-native way and possess a strong research track record, irrespective of their specific academic background. The ideal candidate is someone who would find traditional research organizational structures limiting and is eager to push the boundaries of AI application. You should be comfortable working with and building compound AI systems, exploring concepts like agentic collaboration, and leveraging associated techniques such as ensembling, ReAct patterns, and Graph-of-Thoughts.
• This role emphasizes a bias towards action and demonstrable results ('Showing vs Telling'). Our enterprise clients are looking for tangible proof of AI's power today, not just discussions about elegant, long-term theoretical concepts. Therefore, you must be adept at building prototypes of your ideas and conducting experiments that clearly demonstrate their effectiveness to a Fortune 500 Head of AI.
• You will be expected to use AI tools daily to accelerate your own workflow, much like you will help revolutionize our clients' workflows. Familiarity with tools like ChatGPT, Cursor, and Perplexity is essential, as it demonstrates an understanding of how AI can enhance productivity and innovation.
• Strong programming and data analysis skills are vital. While you may not identify strictly as a software engineer, you need the capability to translate your research ideas into functional prototypes and to perform rigorous experimental analysis to validate their effectiveness.
• This position offers a unique opportunity to work on high-impact projects that directly address critical business challenges for leading enterprises, utilizing state-of-the-art AI models and generous access to modern AI tools.

🎯 Requirements

• Proven experience in designing, implementing, and running evaluations, benchmarks, or experimental frameworks to measure the performance of AI models or complex systems.
• Demonstrated statistical and analytical rigor, with the ability to design fair, reproducible experiments and extract meaningful signal from empirical data, even when it's noisy.
• Experience building and integrating AI systems using pre-trained models (e.g., LLMs) rather than solely focusing on training or fine-tuning models. Expertise in compound AI systems, agentic collaboration, and related techniques (e.g., ensembling, ReAct, Graph-of-Thoughts) is highly desirable.
• A strong track record of research achievements, evidenced through publications in top-tier journals, significant open-source contributions, impactful blog posts, or other demonstrable outputs showcasing innovative AI work.
• Daily, hands-on experience using AI tools (e.g., ChatGPT, Cursor, Perplexity) to enhance personal productivity and workflow.
• Strong programming and data analysis skills, with the ability to develop prototypes and conduct empirical validation of research hypotheses.

🏖️ Benefits

• Competitive base salary range of $130K - $250K, commensurate with experience, location, and level.
• Meaningful equity in a rapidly growing, venture-backed company.
• Comprehensive benefits package, including 100% covered medical, dental, and vision insurance for employees and their dependents.
• 401(k) plan with additional perks, such as commuter benefits and in-office lunch provisions.
• Access to state-of-the-art AI models and generous usage of modern AI tools.
• Opportunity to work on high-impact projects solving real-world business problems for top-tier enterprises.
• A mission-driven, fast-paced culture that values curiosity, pragmatism, and excellence.

Skills & Technologies

React

Rails

GitHub

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Distyl Inc.

Visit Website

About Distyl Inc.

Distyl is a cloud-native platform designed to simplify and accelerate the development and deployment of machine learning (ML) models. It provides a unified environment for data preparation, model training, versioning, and deployment, enabling data scientists and ML engineers to move from experimentation to production faster. The platform offers features such as automated data pipelines, managed training infrastructure, and scalable model serving. Distyl aims to reduce the complexity and operational overhead associated with MLOps, allowing organizations to focus on building and deploying impactful ML solutions. It supports various ML frameworks and integrates with existing cloud infrastructure.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.