
Job Overview
Location
San Francisco
Job Type
Full-time
Category
Software Engineering
Date Posted
April 24, 2026
Full Job Description
📋 Description
- • As an AI Evaluation Engineer at Distyl Inc., you will design and implement evaluation systems that drive AI system improvement through Evaluation-Driven Development, ensuring AI behavior is measured, trusted, and iterated upon in production environments for mission-critical customer applications.
- • Your day-to-day responsibilities include building and maintaining golden test cases and regression suites in Python, developing offline and online evaluation pipelines, defining quality metrics aligned with business objectives, calibrating LLM-based graders against human judgment, and collaborating with cross-functional teams including Forward Deployed AI Engineers and domain experts to guide system design and deployment.
- • Distyl is an applied AI technology company partnering with global enterprises in telecom, healthcare, insurance, manufacturing, and more to rearchitect critical operations using frontier AI. Backed by top-tier investors, the company achieves 100% production deployment success and operates profitably by tightly coupling measurement with AI development.
- • In this role, you will deepen your expertise in AI system evaluation, gain hands-on experience with LLM-based grading and prompt iteration, influence real-world AI deployments at scale, and grow as a systems-oriented engineer who bridges measurement, model behavior, and production reliability.
Skills & Technologies
About Distyl Inc.
Distyl is a cloud-native platform designed to simplify and accelerate the development and deployment of machine learning (ML) models. It provides a unified environment for data preparation, model training, versioning, and deployment, enabling data scientists and ML engineers to move from experimentation to production faster. The platform offers features such as automated data pipelines, managed training infrastructure, and scalable model serving. Distyl aims to reduce the complexity and operational overhead associated with MLOps, allowing organizations to focus on building and deploying impactful ML solutions. It supports various ML frameworks and integrates with existing cloud infrastructure.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities
22 days ago

Aquia Inc.
7 months ago


