Kraken logo

Senior AI Compute Infrastructure Engineer

Job Overview

Location

United States

Job Type

Full-time

Category

Software Engineering

Date Posted

May 8, 2026

Full Job Description

šŸ“‹ Description

  • • Senior AI Compute Infrastructure Engineer role focused on building and operating GPU and accelerator infrastructure to power AI workloads at Kraken, a mission-driven crypto company dedicated to accelerating global adoption of blockchain technology.
  • • Day-to-day responsibilities include owning and operating GPU clusters for training and inference, designing infrastructure for on-prem model execution, optimizing scheduling and orchestration across heterogeneous accelerators, improving inference pipelines using frameworks like vLLM and TensorRT, partnering with ML teams to remove bottlenecks, building observability for GPU utilization and performance, driving reliability and incident response, evaluating new hardware and software stacks, creating tooling for internal teams to consume GPU resources easily, and contributing to long-term architecture decisions balancing performance, cost, and scalability.
  • • The role sits within a small, senior, high-impact AI Compute and Infrastructure team under engineering leadership, collaborating directly with AI/ML researchers, platform engineers, security, and product teams to enable Kraken’s in-house AI ambitions with control, speed, reliability, and cost discipline.
  • • In this role, you will gain deep expertise in large-scale GPU infrastructure, ML serving systems, distributed systems optimization, and cost-efficient compute at scale, while shaping the foundation of Kraken’s internal AI capabilities and contributing to high-stakes, always-on systems that impact product innovation, privacy, latency, and operational excellence.

šŸŽÆ Requirements

  • • 5+ years of infrastructure engineering experience with significant focus on GPU compute, ML infrastructure, distributed systems, HPC, or large-scale production platforms
  • • Hands-on experience operating GPU clusters or accelerator-backed infrastructure in production, including scheduling, orchestration, utilization monitoring, and cost optimization
  • • Strong systems engineering fundamentals across Linux, networking, storage, containers, Kubernetes, distributed runtimes, and production debugging
  • • Experience with ML serving frameworks such as vLLM, Triton Inference Server, TensorRT, TorchServe, KServe, Ray Serve, or equivalent systems
  • • Proficiency in Python for infrastructure automation, tooling, debugging, integration, and operational workflows
  • • Track record of optimizing compute costs while maintaining performance, reliability, and availability

šŸ–ļø Benefits

  • • Fully remote work with colleagues in 70+ countries speaking over 50 languages
  • • Opportunity to work on mission-driven crypto projects focused on financial freedom and inclusion
  • • Access to industry-leading security practices and crypto education resources
  • • Chance to build and shape cutting-edge AI infrastructure from the ground up in a high-impact, senior team
  • • Exposure to diverse global talent and inclusive culture that values merit and diverse perspectives
  • • Support for professional growth through work-style assessments and continuous learning in AI and blockchain

Skills & Technologies

Python
Rust
Node.js
AWS
Kubernetes
DevOps
Senior
Remote

Ready to Apply?

You will be redirected to an external site to apply.

About Kraken

Kraken is a global cryptocurrency exchange established in 2011, offering spot and futures trading for Bitcoin, Ethereum and 200+ digital assets. Headquartered in San Francisco with entities worldwide, it serves retail and institutional clients, providing custody, staking, an NFT marketplace and OTC desk. The platform emphasizes security, regulatory compliance and educational resources.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

Atlanta HQ
Full-time
Expires Jul 11, 2026
Kubernetes
Kafka
Apache Spark
+2 more

4 days ago

Apply
Remote U.S.
Full-time
Expires Jun 23, 2026
JavaScript
TypeScript
React
+5 more

22 days ago

Apply
Las Vegas, Nevada, United States
Full-time
Expires Jun 22, 2026
TypeScript
React
REST
+3 more

23 days ago

Apply
Edgesource Corporation logo

Edgesource Corporation

Remote
Full-time
Expires Jun 16, 2026
AWS
Azure
Remote

29 days ago

Apply