Castai Group Inc. logo

Senior ML Engineer - Kimchi (LLM Inference Optimization)

Job Overview

Location

Spain

Job Type

Full-time

Category

Software Engineering

Date Posted

May 25, 2026

Full Job Description

đź“‹ Description

  • • Optimize LLM inference performance by directly improving throughput, latency, and KV cache utilization across production systems, with measurable impact on customer p99 and company P&L.
  • • Lead the technical direction of inference optimization at Kimchi, owning the roadmap for model serving efficiency without executing someone else’s plan.
  • • Tune inference engines including vLLM, SGLang, and TensorRT-LLM at the kernel level to maximize GPU utilization and push performance ceilings for each GPU SKU.
  • • Attack TTFT and TPOT independently by profiling real bottlenecks—compute, memory bandwidth, scheduling, or networking—and implementing targeted fixes based on empirical data.
  • • Enhance KV cache efficiency through paged attention, prefix caching, eviction policy optimization, and quantized KV cache implementations to unlock unrealized throughput.
  • • Design and implement quantization strategies (INT8, INT4, FP8) for weights, activations, and KV cache while empirically measuring and minimizing quality regression on real-world workloads, not just benchmark metrics.
  • • Reduce cold start times and memory footprint through faster model initialization, smarter weight loading, and precise memory accounting to enable scalable inference deployment.
  • • Design and deploy distributed inference topologies across multi-node, multi-GPU clusters, optimizing for network-aware placement, checkpointing, and interconnect efficiency.
  • • Instrument and measure all optimizations rigorously, distinguishing real performance gains from benchmark artifacts, and ensuring every change is data-driven and reproducible.
  • • Collaborate with a global team of cloud infrastructure experts to integrate inference optimization into Kubernetes-based autonomous infrastructure platforms used by over 2,100 companies.
  • • Define benchmarking priorities, evaluate new tools and frameworks, and document technical decisions through clear writeups and reproducible experiments to align the team.
  • • Maintain and extend the production stack including Python, PyTorch, CUDA-adjacent tooling, Kubernetes, gRPC, ClickHouse, PostgreSQL, GCP Pub/Sub, AWS/GCP/Azure, GitLab CI, ArgoCD, Prometheus, Grafana, Loki, and Tempo.
  • • Balance autonomy with ownership: operate with a wide mandate to innovate, experiment, and drive technical decisions in a fast-paced environment where feature cycles are 1–4 weeks.

🎯 Requirements

  • • 5+ years building real ML systems with depth in inference or training infrastructure (not just model training notebooks)
  • • Strong Python skills for production services, not scripting
  • • Hands-on experience with at least one of vLLM, SGLang, or TensorRT-LLM and a working mental model of inference engine behavior on GPU hardware
  • • Fluency with quantization tradeoffs, including measured quality regressions on real workloads
  • • Comfort with distributed systems: collective communication, sharding strategies, and failure modes in multi-GPU and multi-node setups
  • • Bias toward measurement: instrument before optimizing, and distinguish real wins from benchmark artifacts

🏖️ Benefits

  • • Competitive salary based on experience
  • • Flexible, remote-first global environment
  • • Equity options
  • • 10% work time allocated to personal projects or self-improvement
  • • Learning budget for professional development, including access to international conferences and courses
  • • Annual hackathon and team-building budget for company events

Skills & Technologies

Python
Node.js
PostgreSQL
AWS
Azure
Data Science
Senior
Remote

Ready to Apply?

You will be redirected to an external site to apply.

Castai Group Inc. logo
Castai Group Inc.
Visit Website

About Castai Group Inc.

Castai Group Inc. provides specialized investment and strategic advisory services to middle-market companies across North America. The firm focuses on mergers and acquisitions, private placements, restructuring, and growth capital transactions for businesses in manufacturing, consumer goods, and business services sectors. Headquartered in New York, Castai operates a lean, senior-led model that emphasizes direct principal involvement, rigorous financial analysis, and long-term client partnerships. Its principals have executed more than 200 transactions totaling over $10 billion in aggregate value since inception.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

Hangar Aviation Technologies, Inc. logo

Hangar Aviation Technologies, Inc.

South Africa - Cape Town
Contract
Expires Jul 25, 2026
Go
Senior
Remote

20 hours ago

Apply
Alberta; British Columbia; Manitoba; Nova Scotia; Ontario; Quebec
Full-time
Expires Jul 5, 2026
MongoDB
AWS
Azure
+3 more

22 days ago

Apply
Dublin
Full-time
Expires Jun 13, 2026
JavaScript
TypeScript
Java
+4 more

1 month ago

Apply
Nexus Cognitive LLC logo

Nexus Cognitive LLC

Charlotte, NC
Full-time
Expires Jul 25, 2026
Kubernetes
Kafka
Apache Spark
+1 more

20 hours ago

Apply