
Job Overview
Location
United States
Job Type
Full-time
Category
Software Engineering
Date Posted
June 13, 2026
Full Job Description
đź“‹ Description
- • Design, build, and operate the infrastructure layer supporting AI agent workflows in production across internal tools and external-facing products
- • Ensure reliability, scalability, and observability of agentic systems, including model inference and agent execution pipelines
- • Design and develop platform services, APIs, and SDKs that enable engineering teams to consume AI infrastructure as a self-service platform
- • Manage and maintain compute, orchestration, and serving infrastructure powering AI agents using Kubernetes and Docker
- • Implement Infrastructure as Code (IaC) using Terraform to provision and manage AWS cloud infrastructure components
- • Build and maintain CI/CD pipelines tailored for rapid, reliable deployment of AI services and agent workflows
- • Define and implement guardrails, failure handling, and recovery patterns specific to agentic and LLM-powered systems
- • Establish robust monitoring, alerting, and incident response procedures optimized for ML and AI workloads
- • Collaborate with AI and Data Engineering teams to transition experimental agent prototypes into hardened, production-ready systems
- • Implement access controls and security best practices across AI infrastructure environments to protect sensitive model and data assets
- • Document architecture, runbooks, and operational best practices to enable knowledge sharing and reduce tribal knowledge across teams
- • Participate in on-call rotations to respond to incidents affecting AI agent infrastructure with a focus on rapid resolution and post-mortem analysis
- • Partner with Data Engineering, ML, and product teams to align platform capabilities with evolving product and research needs
- • Prioritize developer experience in platform design, ensuring internal tools and APIs are intuitive, well-documented, and adopted at scale
- • Operate in a fast-moving environment where platform engineering decisions directly impact the reliability of AI products used by millions of users
🎯 Requirements
- • 5+ years of experience as a Site Reliability Engineer, Infrastructure Engineer, Platform Engineer, or similar role in a production environment
- • Hands-on experience supporting ML infrastructure, model serving, or MLOps workflows in production
- • Proficiency with Infrastructure as Code tools, particularly Terraform
- • Experience with containerization and orchestration, particularly Kubernetes and Docker
- • Solid understanding of cloud infrastructure, preferably AWS
- • Strong scripting skills (bash/shell) and proficiency in at least one programming language (Python preferred)
🏖️ Benefits
- • Opportunity to work on cutting-edge AI agent infrastructure at a leading crypto platform trusted by over 10 million users
- • Collaborative environment working across AI, Data Engineering, and product teams to shape the future of open finance
- • Culture that values diverse perspectives and encourages applications even if all requirements are not fully met
- • Equal opportunity employer with no tolerance for discrimination or harassment based on protected characteristics
- • Consideration of qualified applicants with criminal histories consistent with the San Francisco Fair Chance Ordinance
- • Ability to redact personal information such as age, date of birth, or graduation dates from resumes during application
Skills & Technologies
See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.
About Kraken
Kraken is a global cryptocurrency exchange established in 2011, offering spot and futures trading for Bitcoin, Ethereum and 200+ digital assets. Headquartered in San Francisco with entities worldwide, it serves retail and institutional clients, providing custody, staking, an NFT marketplace and OTC desk. The platform emphasizes security, regulatory compliance and educational resources.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities

Red Gate Software Limited
3 months ago

Kit Global Inc.
3 months ago

Montu UK Limited
3 months ago

Red Gate Software Limited
3 months ago