Sintra logo

LLM Ops Evaluations

Job Overview

Location

Remote

Job Type

Full-time

Category

Data Science

Date Posted

February 12, 2026

Full Job Description

đź“‹ Description

  • • Are you passionate about the intricate world of Large Language Models (LLMs) and driven to ensure their real-world performance meets and exceeds expectations? Sintra is seeking a highly skilled and motivated LLM Ops Specialist to take ownership of the quality and effectiveness of our AI "teammates." In a landscape where 50,000 small business owners rely on Sintra daily, our AI helpers are not just tools; they are integral components that reply to customers, manage social media, analyze data, and perform a myriad of other crucial tasks. For many of our users, these AI assistants represent the first tangible and effective help they've ever had in running their businesses. We've built something that resonates deeply with our users, and now we're focused on elevating it to new heights of excellence.
  • • This is a foundational role where you will be instrumental in shaping the future of AI quality at Sintra. You will be responsible for designing and implementing our comprehensive evaluation framework, acting as the primary technical liaison with leading LLM providers, and driving the continuous improvement of LLM outputs across our entire application. Your expertise will directly impact the reliability, accuracy, and overall helpfulness of our AI helpers, ensuring they continue to feel like true teammates to our users.
  • • Your core mission will be to build and refine the systems that rigorously assess and guarantee the performance of our AI helpers. This involves a deep dive into prompt engineering, meticulous evaluation processes, the curation of robust datasets, and strategic model selection. You will be at the forefront of defining what "quality" means for AI-powered products at a rapidly scaling company, translating complex technical capabilities into tangible benefits for small business owners.
  • • Key responsibilities will include:
  • • Owning the end-to-end quality assurance of all AI-generated outputs across the Sintra platform. This encompasses everything from initial prompt design to final output validation, ensuring consistency, accuracy, and adherence to brand voice and user expectations.
  • • Designing, building, and iterating on our comprehensive evaluation framework. This will involve developing automated testing suites to catch regressions, establishing effective human review loops for nuanced assessments, and creating robust quality scoring mechanisms to quantify performance.
  • • Creating, versioning, and meticulously optimizing prompts for a wide array of use cases. You will be responsible for ensuring that each AI helper is equipped with the most effective instructions to perform its specific tasks with precision and efficiency.
  • • Building and maintaining high-quality, representative test datasets. These datasets are critical for identifying performance degradation and preventing regressions before they impact our users, acting as a vital safety net for our AI functionalities.
  • • Collaborating closely with engineering teams to integrate evaluation systems and automate quality checks, ensuring that quality is a built-in aspect of our development lifecycle, not an afterthought.
  • • Serving as the primary technical point of contact with leading AI labs and LLM providers. This involves understanding their roadmaps, providing feedback on model performance, and advocating for features that enhance our ability to deliver high-quality AI experiences.
  • • Developing strategies for handling the complexity of hundreds of potential use cases and user customization options, ensuring our evaluation framework is scalable and adaptable.
  • • As Sintra scales, you will have the opportunity to hire and lead a team of talented prompt engineers and evaluation specialists, fostering a culture of quality and continuous improvement within the AI team.
  • • Staying abreast of the latest advancements in LLM technology, evaluation methodologies, and best practices in MLOps to continuously enhance our AI quality standards.
  • • This role is ideal for an individual who has a proven track record of shipping LLM-powered products at scale, possesses a strong systems-thinking mindset, and is eager to define the standards for AI quality in a fast-growing organization. If you are excited by the prospect of shaping the intelligence behind a product that genuinely empowers small businesses, this is your opportunity.

Skills & Technologies

Remote

Ready to Apply?

You will be redirected to an external site to apply.

About Sintra

Sintra is a company focused on providing innovative solutions in the technology sector. They specialize in [specific area, e.g., artificial intelligence, cloud computing, software development] to help businesses optimize their operations and achieve their strategic goals. Sintra's approach emphasizes [key value proposition, e.g., data-driven insights, user-centric design, scalable architectures]. The company is committed to delivering high-quality products and services, fostering strong client relationships, and driving digital transformation. Their team of experts works collaboratively to address complex challenges and create tangible value for their partners. Sintra aims to be a leader in its field by continuously exploring new technologies and methodologies to stay ahead of market trends.

Similar Opportunities

❌ EXPIRED
Remote - USA
Full-time
Expired Jan 22, 2026
Python
AWS
Azure
+6 more

3 months ago

Apply
❌ EXPIRED
Houston, TX
Full-time
Expired Nov 17, 2025
JavaScript
React
Node.js
+4 more

5 months ago

Apply
International SOS Government Medical Services logo

International SOS Government Medical Services

Remote
Full-time
Expires Mar 30, 2026
Remote

17 days ago

Apply
Toronto
Full-time
Expires Mar 11, 2026
Python
Azure
TensorFlow
+3 more

1 month ago

Apply