This job has expired

This position was posted on May 12, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Member of Technical Staff - Evals

P1 AI Inc.

Job Overview

Location

United States

Job Type

Full-time

Full Job Description

📋 Description

• As a Member of Technical Staff - Evals at P1 AI Inc., you will play a critical role in ensuring the reliability, performance, and continuous improvement of Archie, the company’s AI engineer designed for engineering AGI. Your work will directly impact the validation of Archie’s capabilities against real-world engineering benchmarks, helping to ensure it learns and retains the skills needed to perform complex engineering tasks across industrial domains.
• Day to day, you will implement and operate systems for organizing, transforming, running, grading, and reporting on eval benchmarks; design and execute processes for developing and QA’ing evals with input from engineering experts and industrial partners; ensure evals integrate effectively within CI/CD pipelines for continuous benchmarking; create methods to detect AI-specific quality issues like hallucinations, stochasticity, and regressions; and serve as a technical leader in standardizing automated testing practices across the technology stack.
• You will join a small, high-performing team of top talent in deep learning, model-based engineering, and industrial applications, working closely with founding team members from OpenAI, DeepMind, and other leaders in AI and engineering. The company is mission-driven, backed by $23M in seed funding from Radical Ventures, and focused on deploying Archie across engineering teams in industrial companies worldwide.
• In this role, you will deepen your expertise in AI evaluation, test system design, and CI/CD integration while contributing to a pioneering effort in engineering AGI. You’ll gain experience collaborating with multidisciplinary stakeholders, shaping evaluation frameworks for next-gen AI systems, and operating in a fast-paced startup environment that values ownership, intellectual excellence, and shipping discipline.

🎯 Requirements

• Experience in constructing comprehensive test suites for software and/or AI systems, including coordinating the contributions of others
• Experience designing metrics to evaluate systems and visualize their performance, including differences across successive generations
• Proficiency in Python programming, complex modules and modern software development tools and practices (Git, CI/CD, etc.)
• Good communication skills with a variety of stakeholders (AI researchers, domain experts, application developers)
• Ability to thrive in a fast-paced, dynamic startup environment

🏖️ Benefits

• Healthcare, dental, and vision insurance
• 401k with employer matching
• Unlimited PTO
• Significant equity component as part of compensation
• Opportunity to work remotely (US or Canada) with periodic in-person co-working sessions in San Mateo

Skills & Technologies

Python

Git

Senior

Remote

$170k-200k

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

P1 AI Inc.

Visit Website

About P1 AI Inc.

P1 AI is a software company that builds an AI-powered platform for enterprise revenue teams. Its product integrates data ingestion, predictive analytics, and workflow automation to help sales and marketing organizations prioritize leads, forecast revenue, and personalize outreach at scale. The system continuously learns from CRM, email, and engagement data to surface next-best actions and optimize pipeline performance for B2B companies.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.