
Job Overview
Location
UK
Job Type
Full-time
Category
Machine Learning Engineer
Date Posted
March 4, 2026
Full Job Description
đź“‹ Description
- • Ruby Labs is a dynamic and innovative tech company dedicated to creating cutting-edge consumer products across the health, education, and entertainment sectors. We are on a mission to shape the future of consumer-led products through our passionate and forward-thinking teams. As we continue to expand, we are seeking a highly skilled and motivated Senior AI Engineer to join our ranks and play a pivotal role in advancing our AI capabilities.
- • In this crucial role, you will be instrumental in shaping our AI infrastructure and spearheading the development of production-ready Large Language Model (LLM) experiences. You will operate within a modern technology stack, making critical, data-driven decisions that impact model performance, system reliability, and operational costs. This is an opportunity to take ownership of significant AI features, guiding them from initial experimentation through to successful live production deployment.
- • A core aspect of your responsibility will involve mastering advanced prompt engineering techniques. This includes designing intricate, dynamic prompt templates that incorporate conditional logic and efficiently manage context and information reuse to achieve the highest quality of AI-generated content and reasoning. You will also focus on implementing robust structured output mechanisms, utilizing various response formats such as JSON mode, function calling, and schema validation (e.g., Zod/JSON schemas) to ensure AI outputs are predictable, accurate, and seamlessly integrable into our application logic.
- • Building and maintaining effective evaluation pipelines is paramount. You will leverage tools like Langfuse to meticulously collect feedback and score the quality of AI responses in real-time, enabling continuous improvement. Furthermore, you will engage in deep debugging of complex LLM chains, utilizing Langfuse traces to pinpoint performance bottlenecks and optimize for critical factors like cost, latency, and efficient context window utilization.
- • A significant part of this role involves conducting systematic AI A/B testing. You will experiment with different LLM models, potentially comparing leading options like Claude 3.5 Sonnet against GPT-4o, using AI gateways such as OpenRouter. The analysis of these experiments will be driven by quantitative metrics, ensuring that deployment decisions for new prompts or models are grounded in empirical data rather than subjective assessments.
- • You will be responsible for developing sophisticated scoring systems designed to analyze the entire “Problem → Solution” chain of AI interactions. This analysis will be crucial for identifying the root causes of issues such as hallucinations or logical errors, with insights derived from Langfuse analytics. Continuous re-evaluation of model performance is also key, staying abreast of emerging architectures and performing fine-tuning when necessary to meet specific domain requirements and performance benchmarks.
- • The ideal candidate will possess a deep understanding of the Node.js and Next.js stack, enabling the construction of reliable services capable of handling complex, AI-generated data. Proven experience in building dynamic prompts where content generation is highly dependent on input variables and context injection is essential. Familiarity with OpenRouter, including managing rate limits and selecting cost-effective models, is highly desirable. A strong grasp of LLM observability principles, including setting up tracing, creating test datasets, and integrating scoring systems, preferably with tools like Langfuse or similar, is expected.
- • An analytical mindset, with the ability to translate raw generation logs into actionable business metrics and technical insights, is crucial. An iterative mindset, focused on continuous product improvement through constant feedback loops, will be key to success in this role. While not strictly required, practical experience in fine-tuning models for specific domain tasks or JSON compliance, a solid understanding of RAG architecture, and basic Python knowledge for data science scripts or evaluation libraries would be considered significant advantages.
- • This role offers a unique opportunity to work remotely within approximately ± 4 hours of the CET (Central European Time) zone, ensuring optimal collaboration. You will be part of a fast-growing team, contributing to impactful projects and experiencing significant personal and professional growth.
Skills & Technologies
About Ruby Labs Ltd.
Ruby Labs Ltd. is a London-based product studio that builds and scales consumer subscription mobile and web applications. The company focuses on health, wellness, and productivity verticals, developing apps such as Hint, Able, and the award-winning fitness platform FitCoach. Using data-driven growth and proprietary technology, Ruby Labs rapidly prototypes, launches, and iterates products to serve millions of global users. The team combines engineering, product design, and performance marketing expertise to create sustainable digital businesses. Founded in 2018, Ruby Labs operates a portfolio of self-funded apps, emphasizing user privacy, scientific validation, and long-term customer value.



