
Job Overview
Location
United Kingdom
Job Type
Full-time
Category
Software Engineering
Date Posted
June 13, 2026
Full Job Description
đź“‹ Description
- • Design, build, and operate the infrastructure layer supporting AI agent workflows in production for both internal tools and external-facing products
- • Ensure reliability, scalability, and observability of agentic systems across Kraken’s crypto trading and financial infrastructure
- • Design and develop platform services, APIs, and SDKs that enable engineering, AI, and data teams to consume AI infrastructure as a self-service platform
- • Manage and maintain compute, orchestration, and model-serving infrastructure powering LLM-based agent execution and inference
- • Implement robust monitoring, alerting, and incident response procedures specifically tailored to AI/ML workloads and agent-based systems
- • Utilize Infrastructure as Code (IaC) tools, primarily Terraform, to provision and manage AWS cloud infrastructure components
- • Build and maintain CI/CD pipelines for rapid, reliable deployment of AI services and agent workflows
- • Define and implement guardrails, failure handling, and recovery patterns for agentic and LLM-powered systems
- • Collaborate with AI and Data Engineering teams to transition experimental agent prototypes into hardened, production-grade systems
- • Manage containerized workloads using Kubernetes to ensure efficient deployment, scaling, and orchestration of AI services
- • Implement access controls and security best practices across all AI infrastructure environments
- • Document architecture, runbooks, and operational best practices to support knowledge sharing and team scalability
- • Operate as a platform engineering team focused on developer experience, platform adoption, and long-term scalability of AI infrastructure
- • Work closely with Data Engineering, ML, and product-facing teams to harden agent infrastructure to meet institutional-grade reliability standards
- • Participate in on-call rotations to respond to production incidents affecting AI agent systems
🎯 Requirements
- • 5+ years of experience as a Site Reliability Engineer, Infrastructure Engineer, Platform Engineer, or similar role in a production environment
- • Hands-on experience supporting ML infrastructure, model serving, or MLOps workflows in production
- • Experience building developer platforms, internal tooling, APIs, or SDKs consumed by engineering teams at scale
- • Strong understanding of platform engineering principles, including developer experience, self-service infrastructure, and API-driven platform design
- • Proficiency with Infrastructure as Code tools, particularly Terraform
- • Experience with containerization and orchestration, particularly Kubernetes and Docker
🏖️ Benefits
- • Opportunity to work at the intersection of data infrastructure and applied AI in a fast-moving, high-stakes production environment
- • Collaborative culture with cross-functional teams across Data Engineering, ML, and product engineering
- • Exposure to cutting-edge AI agent systems and LLM-powered infrastructure at scale
- • Employment with a globally trusted crypto platform serving over 10 million users
- • Consideration of qualified applicants with criminal histories consistent with the San Francisco Fair Chance Ordinance
- • Equal opportunity employer that values diversity in background, perspective, and experience
Skills & Technologies
See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.
About Kraken
Kraken is a global cryptocurrency exchange established in 2011, offering spot and futures trading for Bitcoin, Ethereum and 200+ digital assets. Headquartered in San Francisco with entities worldwide, it serves retail and institutional clients, providing custody, staking, an NFT marketplace and OTC desk. The platform emphasizes security, regulatory compliance and educational resources.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities

Booking Holdings Inc.
11 days ago

CSG Systems International, Inc.
1 month ago

LiveKit, Inc.
11 days ago

Afresh Technologies, Inc.
8 months ago