Castai Group Inc. logo

Senior ML Engineer - Kimchi (LLM Inference Optimization) UK

Job Overview

Location

United Kingdom

Job Type

Full-time

Category

Software Engineering

Date Posted

May 25, 2026

Full Job Description

đź“‹ Description

  • • Optimize LLM inference performance by directly improving throughput, latency, and KV cache utilization across production-grade systems, with measurable impact on customer p99 and company P&L.
  • • Lead the technical direction of inference optimization at Kimchi, owning the roadmap for model serving efficiency rather than executing predefined tasks.
  • • Tune inference engines including vLLM, SGLang, and TensorRT-LLM at the kernel level to maximize GPU utilization and raise performance ceilings for each GPU SKU.
  • • Reduce latency by profiling and resolving actual bottlenecks in TTFT and TPOT, distinguishing between compute, memory bandwidth, scheduling, and networking constraints.
  • • Maximize KV cache efficiency through paged attention, prefix caching, eviction policies, cache reuse across requests, and quantized KV storage.
  • • Implement quantization strategies (INT8, INT4, FP8) across weights, activations, and KV cache while empirically measuring and preventing quality regressions on real-world workloads—not just perplexity benchmarks.
  • • Reduce cold start times and memory footprint through faster model initialization, smarter weight loading, and precise memory accounting to enable scalable deployment.
  • • Design and implement distributed inference topologies, including network-aware placement, sharding strategies, and checkpointing systems that avoid storage or interconnect bottlenecks.
  • • Instrument and measure all optimizations rigorously, ensuring improvements are real and reproducible, not artifacts of benchmarking conditions.
  • • Collaborate with a global team of cloud and Kubernetes experts to integrate inference optimization into autonomous infrastructure automation platforms.
  • • Define what benchmarks to run, what technologies to adopt, and what to build in-house, supported by detailed writeups and reproducible experiments.
  • • Work with a stack including Python, PyTorch, CUDA-adjacent tooling, Kubernetes, gRPC, ClickHouse, PostgreSQL, GCP Pub/Sub, AWS/GCP/Azure, GitLab CI, ArgoCD, Prometheus, Grafana, Loki, and Tempo.
  • • Operate in a high-autonomy environment where technical decisions are driven by data, experimentation, and ownership of outcomes.
  • • Deliver feature projects with rapid iteration cycles, typically completed within 1 to 4 weeks.
  • • Contribute 10% of work time to personal projects or skill development aligned with professional growth.
  • • Participate in annual company hackathons to innovate and strengthen team collaboration.
  • • Maintain a remote-first, globally distributed workflow with asynchronous communication and flexible scheduling.

🎯 Requirements

  • • 5+ years building real ML systems with demonstrated depth in inference or training infrastructure (not just model training notebooks)
  • • Strong Python proficiency in production services, not scripting or prototyping
  • • Hands-on experience with at least one of vLLM, SGLang, or TensorRT-LLM and a working understanding of inference engine behavior on GPU hardware
  • • Fluency in quantization tradeoffs, including empirical measurement of quality regressions beyond compression ratios
  • • Comfort with distributed systems: collective communication, sharding, and failure modes in multi-GPU and multi-node setups
  • • Bias toward measurement: instrument before optimizing, distinguish real gains from benchmark artifacts

🏖️ Benefits

  • • Competitive salary based on experience
  • • Flexible, remote-first global work environment
  • • Equity options
  • • Learning budget for professional development, including access to international conferences and courses
  • • 10% work time allocated to personal projects or self-improvement
  • • Annual hackathon and team-building budget for company events

Skills & Technologies

Python
Node.js
PostgreSQL
AWS
Azure
Data Science
Senior
Remote

Ready to Apply?

You will be redirected to an external site to apply.

Castai Group Inc. logo
Castai Group Inc.
Visit Website

About Castai Group Inc.

Castai Group Inc. provides specialized investment and strategic advisory services to middle-market companies across North America. The firm focuses on mergers and acquisitions, private placements, restructuring, and growth capital transactions for businesses in manufacturing, consumer goods, and business services sectors. Headquartered in New York, Castai operates a lean, senior-led model that emphasizes direct principal involvement, rigorous financial analysis, and long-term client partnerships. Its principals have executed more than 200 transactions totaling over $10 billion in aggregate value since inception.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

Expired
Remote US
Full-time
Expired May 18, 2026
Senior
Remote

2 months ago

Apply
Expired
Boston, MA
Full-time
Expired Jan 9, 2026
Python
Senior
Hybrid

7 months ago

Apply
Remote - Other
Full-time
Expires Jul 6, 2026
Remote

20 days ago

Apply
Expired
Remote
Full-time
Expired Mar 9, 2026
Python
R
PostgreSQL
+4 more

5 months ago

Apply