QA Engineer, AI Products

MD Aware, LLC

Job Overview

Location

Remote

Job Type

Full-time

Full Job Description

📋 Description

• Design and execute test strategies for LLM-powered clinical features, focusing on non-deterministic outputs, prompt regression, output evaluation, and hallucination detection in AI-driven decision tools.
• Build and maintain automated evaluation pipelines using golden datasets, LLM-as-judge frameworks, and rubric-based scoring to identify quality regressions in AI responses across web and mobile platforms.
• Perform black-box and exploratory testing of MDCalc’s AI features, prioritizing clinical accuracy, safety, edge cases, and real-world usage scenarios encountered by physicians and healthcare professionals.
• Define and quantify quality metrics for AI outputs—including accuracy, faithfulness, relevance, safety, latency, and cost—and establish measurable thresholds for release readiness and compliance with clinical standards.
• Collaborate cross-functionally with engineers, product managers, ML/AI engineers, and clinical reviewers to co-define what constitutes an acceptable, trustworthy AI response for point-of-care clinical use.
• Investigate and triage AI failure modes to determine root causes, distinguishing between model behavior, prompt design flaws, retrieval system errors, and integration bugs.
• Develop and refine QA strategies to scale testing capacity as the AI product surface expands, ensuring coverage grows in alignment with new features, specialties, and patient conditions.
• Utilize prompt engineering principles and LLM tooling (e.g., Promptfoo, LangSmith, DeepEval, Ragas, OpenAI Evals) to evaluate, audit, and improve AI-generated content for clinical reliability.
• Implement automated qualitative evaluation methods such as semantic similarity checks, token usage profiling, and latency monitoring as critical indicators of system performance and quality.
• Leverage SQL for data validation, test data generation, and verification of data integrity across backend systems supporting AI features.
• Work with Playwright to automate UI-level testing workflows for web and mobile interfaces, ensuring seamless interaction between AI outputs and user-facing components.
• Monitor and analyze token consumption, API costs, and response times as key quality signals tied to operational efficiency and user experience.
• Actively participate in product and engineering discussions to advocate for testability, risk mitigation, prompt guardrails, and clinical safety in AI feature design.
• Clearly communicate ambiguous, probabilistic failures and blockers to technical and non-technical stakeholders with precision, context, and actionable insights.
• Proactively identify gaps in current QA processes and propose scalable solutions to improve automation, coverage, and clinical trust in MDCalc’s AI-powered tools.
• Maintain awareness of evolving clinical guidelines and real-world physician workflows to align AI testing with actual point-of-care decision-making needs.
• Support the expansion of MDCalc’s AI product team by contributing to the development of new QA frameworks and documentation that standardize evaluation practices across the organization.

🎯 Requirements

• 5+ years of experience in software QA, with at least 1 year of hands-on testing of LLM-based or AI/ML-powered features
• Strong understanding of QA principles, test case creation/documentation, and best practices for deterministic and non-deterministic systems
• Hands-on experience with LLM tooling and concepts: prompt engineering, RAG systems, evaluation frameworks (e.g., Promptfoo, Braintrust, LangSmith, DeepEval, Ragas, OpenAI Evals), and LLM APIs (OpenAI, Anthropic, etc.)
• Experience designing automated qualitative evaluation approaches, including LLM-as-judge, rubric-based scoring, semantic similarity checks, and golden dataset regression testing
• Proficiency with test automation tools, with a focus on Playwright
• Strong SQL skills for data validation, test data creation, and verifying data integrity across systems

🏖️ Benefits

• Medical, Dental, & Vision Coverage, with option to extend to dependents
• Company-sponsored short-term insurance
• Fully-paid 8 week parental leave, after 6 months of employment
• Company-sponsored 401k, after 3 months of employment
• Unlimited vacation for salaried roles
• Work from home monthly stipend

Skills & Technologies

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

MD Aware, LLC

Visit Website

About MD Aware, LLC

MD Aware, LLC operates MDCalc, a free online clinical decision support platform that aggregates evidence-based calculators, risk scores, and algorithms for physicians and healthcare professionals. Founded in 2005 by two emergency physicians, the service provides rapid access to more than 550 validated tools covering specialties such as cardiology, emergency medicine, and oncology. Each calculator is accompanied by concise summaries of the underlying research, relevant society guidelines, and point-of-care guidance. The company generates revenue through targeted pharmaceutical advertising and institutional licensing while maintaining editorial independence. MDCalc reports over one million active users monthly and is integrated into several electronic health record systems.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.