Lead AI Test Automation Engineer

DevRev Inc.

Job Overview

Location

Philippines Remote

Job Type

Full-time

Full Job Description

📋 Description

• As the Lead AI Test Automation Specialist at DevRev, you will be at the forefront of ensuring the quality and reliability of our groundbreaking AI-powered platform, Computer. This pivotal role involves architecting and implementing sophisticated testing strategies, evaluation frameworks, and quality metrics specifically tailored for Large Language Model (LLM) applications and generative AI features. You will be instrumental in shaping how we approach quality for AI-driven products, ensuring they are not only functional but also accurate, safe, and trustworthy for our enterprise users.
• Your primary responsibility will be to design and execute comprehensive testing strategies for a variety of GenAI features. This includes, but is not limited to, conversational AI interfaces, complex agentic systems that perform automated tasks, and intricate LLM-powered workflows that streamline business processes. You will need to anticipate potential issues and develop robust testing methodologies to address the unique challenges presented by AI systems.
• A key aspect of this role is the development of automated test suites. You will focus on creating sophisticated prompt testing mechanisms, including regression tests that are crucial for detecting subtle yet significant unintended changes in model behavior over time. This ensures that as our AI models evolve, their core functionalities and responses remain consistent and predictable.
• You will be tasked with creating innovative evaluation frameworks designed to measure the quality of our GenAI outputs across multiple critical dimensions. These dimensions include accuracy (how correct the information is), relevance (how pertinent the response is to the query), safety (ensuring no harmful or biased content is generated), consistency (maintaining a coherent persona and logic), and latency (the speed of response). Developing quantifiable metrics for these often subjective areas will be a significant part of your work.
• Building and maintaining high-quality test datasets and golden examples is essential. These datasets will need to represent a wide spectrum of user scenarios, including common use cases and challenging edge cases, to thoroughly validate the AI's performance under diverse conditions.
• To ensure continuous quality assurance, you will implement robust monitoring and alerting systems. These systems will be designed to detect and notify the team of any quality degradation in production GenAI features, allowing for rapid intervention and resolution.
• A critical component of AI testing is adversarial testing. You will perform rigorous tests to proactively identify potential failures, such as AI hallucinations (generating false information), inherent biases in the models, or security vulnerabilities within the AI systems. This proactive approach is vital for building user trust.
• You will collaborate closely with software engineers to define clear acceptance criteria and establish quality gates for all AI feature releases. This ensures that features meet predefined quality standards before deployment.
• Furthermore, you will develop and champion tools and frameworks that empower engineers to easily test their GenAI implementations. This includes creating reusable components and best practices that streamline the testing process across development teams.
• Conducting user acceptance testing (UAT) and actively gathering feedback on AI feature performance from internal users will provide invaluable real-world insights to refine the product.
• Comprehensive documentation of testing procedures, identified issues, and key quality metrics is required. This documentation should be clear, concise, and easily accessible to all stakeholders.
• You will partner effectively with Product and Design teams to ensure that all AI features not only function correctly but also meet the highest standards of user experience, aligning with the overall product vision.
• Staying abreast of the latest advancements in GenAI testing methodologies, emerging tools, and industry best practices will be crucial for maintaining the cutting edge of our quality assurance efforts.
• Ultimately, you will play a key role in defining and embedding quality practices for GenAI applications at DevRev, influencing the entire AI product development lifecycle from initial design through to final release, and helping to shape quality standards that will impact millions of enterprise users.

Skills & Technologies

Python

JavaScript

REST

Selenium

Pytest

Senior

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

DevRev Inc.

Visit Website

About DevRev Inc.

DevRev provides an AI-native platform that unifies product development and customer support workflows. The cloud software links engineering teams, product managers, and support agents on one data layer, replacing separate CRM, ticketing, and project tools. Live telemetry, knowledge graphs, and generative AI surface insights, automate responses, and prioritize backlogs. Founded in 2020 by former Nutanix CEO Dheeraj Pandey and Manoj Agarwal, the company targets enterprises seeking faster release cycles and improved customer experience. Headquartered in Palo Alto, it operates globally with a remote workforce and has raised over $100 million in seed and Series A funding.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.