
Job Overview
Location
Portugal, Remote
Job Type
Full-time
Category
Machine Learning Engineer
Date Posted
February 24, 2026
Full Job Description
đź“‹ Description
- • Join Datadog's cutting-edge MCP Services team as a Staff AI Engineer, a pivotal role shaping the future of how intelligent agents interact with our comprehensive observability platform.
- • You will be instrumental in scaling and evolving the MCP Services interface, which serves as the bridge enabling external agents like Claude and Cursor, alongside internal Datadog AI Agents, to seamlessly access and leverage Datadog's vast data landscape.
- • This position offers a unique opportunity to drive the development of agentic workflows that span across critical areas such as metrics analysis, log investigation, and incident response, empowering users with unprecedented AI-driven insights.
- • As a Staff Engineer, you will spearhead the design and implementation of advanced evaluation frameworks specifically for agentic systems, ensuring robust performance and reliability.
- • You will contribute directly to defining and building the next generation of agent-tool interaction models, pushing the boundaries of what's possible in applied AI.
- • This is a dynamic and rapidly evolving field characterized by significant ambiguity and the potential for high impact, making it an ideal environment for individuals eager to influence Datadog's applied AI strategy.
- • Lead significant efforts to enhance Datadog’s public-facing MCP server, focusing on enabling intelligent agents to discover, understand, and interact with our extensive suite of services with greater efficiency and accuracy.
- • Architect and implement sophisticated agentic tool surfaces, meticulously designed for both rigorous evaluation and seamless production deployment across a diverse spectrum of AI agents.
- • Develop and maintain state-of-the-art evaluation pipelines, crucial for accurately measuring the performance of AI agents on complex Datadog workflows, including but not limited to, in-depth investigations, rapid incident triage, and intricate metric queries.
- • Proactively investigate and resolve complex failure cases by conducting thorough analyses of tool outputs, optimizing query parsing mechanisms, and enhancing the feedback loops between agents and the tools they utilize.
- • Foster close collaboration with teams across Applied AI and various internal departments to establish and uphold shared standards for tool integration, data access protocols, and overall system interoperability.
- • Drive innovation in how AI agents can effectively utilize Datadog's platform to solve real-world problems, from identifying performance bottlenecks to automating complex operational tasks.
- • You will be a key contributor to ensuring the security, scalability, and reliability of the MCP Services, handling critical data and interactions.
- • Mentor junior engineers and contribute to the technical vision of the MCP Services team, fostering a culture of learning and continuous improvement.
- • Stay abreast of the latest advancements in AI, agentic systems, and LLM technologies, bringing innovative ideas and solutions to Datadog.
- • Translate complex technical requirements into actionable engineering plans and deliver high-quality, production-ready code.
- • Champion best practices in software development, testing, and deployment within the AI engineering domain.
- • Contribute to the strategic roadmap of Datadog's AI initiatives, providing technical leadership and insights.
- • You will work with a talented and passionate team dedicated to building the future of AI-powered observability.
- • This role requires a blend of deep technical expertise, strategic thinking, and a passion for solving challenging problems in a fast-paced environment.
- • The opportunity to make a tangible impact on Datadog's product and its customers by enabling more intelligent and automated workflows.
- • You will be at the forefront of applying AI to solve complex challenges in cloud infrastructure monitoring and management.
- • Embrace the hybrid work model at Datadog, balancing in-office collaboration with the flexibility of remote work to foster creativity and work-life harmony.
🎯 Requirements
- • Proven experience as a Staff-level engineer with a strong foundation in applied AI, agentic programming, LLM-powered automation pipelines, LLM orchestration frameworks (e.g., LangChain, LangGraph, CrewAI), and/or agent orchestration and tool-use systems.
- • Demonstrated ability to thrive in environments with high ambiguity and rapid change, with a capacity to autonomously define and prioritize strategic direction.
- • Expertise in designing, building, and implementing robust evaluation frameworks for LLM agents or AI systems, including proficiency in metrics design and data instrumentation.
- • Familiarity with the MCP standard or experience contributing to agent-compatible tooling surfaces, and a strong understanding of building and evaluating ReAct agentic loops (nice-to-have).
- • Exceptional systems thinking capabilities, with the ability to reason effectively across multiple agents, tools, and diverse user scenarios.
- • A deep passion for advancing agent-augmented software and a strong desire to actively shape evolving interface paradigms.
🏖️ Benefits
- • Generous and competitive benefits package designed to support your overall well-being.
- • New hire stock equity (RSUs) and an employee stock purchase plan to foster shared ownership and long-term financial growth.
- • Continuous career development and clear pathing opportunities to support your professional growth and advancement.
- • Employee-focused, best-in-class onboarding experience to ensure a smooth and successful integration into Datadog.
- • Access to an internal mentor and a cross-departmental buddy program to facilitate networking and knowledge sharing.
- • A friendly, inclusive, and collaborative workplace culture that values diversity and encourages innovation.
About Datadog, Inc.
Datadog, Inc. provides a cloud-scale monitoring and analytics platform that unifies infrastructure metrics, application performance traces, and log data. It offers real-time dashboards, machine-learning alerts, distributed tracing, synthetic monitoring, and security analytics for cloud, on-premises, and hybrid environments. The company serves technology, financial services, retail, and government organizations worldwide, helping them reduce downtime, optimize resource usage, and improve software performance across dynamic, containerized architectures.
Similar Opportunities
⏰ EXPIRES SOON

Swayable Inc.
Remote
Full-time
Expires Mar 6, 2026 (Soon)
Python
JavaScript
Vue.js
+5 more
2 months ago


