
Job Overview
Location
United Kingdom
Job Type
Full-time
Category
Software Engineering
Date Posted
May 25, 2026
Full Job Description
đź“‹ Description
- • Optimize LLM inference performance by directly improving throughput, latency, and KV cache utilization across production-grade systems, with measurable impact on customer p99 and company P&L.
- • Lead the technical direction of inference optimization at Kimchi, owning the roadmap for model serving efficiency rather than executing predefined tasks.
- • Tune inference engines including vLLM, SGLang, and TensorRT-LLM at the kernel level to maximize GPU utilization and raise performance ceilings for each GPU SKU.
- • Reduce latency by profiling and resolving actual bottlenecks in TTFT and TPOT, distinguishing between compute, memory bandwidth, scheduling, and networking constraints.
- • Maximize KV cache efficiency through paged attention, prefix caching, eviction policies, cache reuse across requests, and quantized KV storage.
- • Implement quantization strategies (INT8, INT4, FP8) across weights, activations, and KV cache while empirically measuring and preventing quality regressions on real-world workloads—not just perplexity benchmarks.
- • Reduce cold start times and memory footprint through faster model initialization, smarter weight loading, and precise memory accounting to enable scalable deployment.
- • Design and implement distributed inference topologies, including network-aware placement, sharding strategies, and checkpointing systems that avoid storage or interconnect bottlenecks.
- • Instrument and measure all optimizations rigorously, ensuring improvements are real and reproducible, not artifacts of benchmarking conditions.
- • Collaborate with a global team of cloud and Kubernetes experts to integrate inference optimization into autonomous infrastructure automation platforms.
- • Define what benchmarks to run, what technologies to adopt, and what to build in-house, supported by detailed writeups and reproducible experiments.
- • Work with a stack including Python, PyTorch, CUDA-adjacent tooling, Kubernetes, gRPC, ClickHouse, PostgreSQL, GCP Pub/Sub, AWS/GCP/Azure, GitLab CI, ArgoCD, Prometheus, Grafana, Loki, and Tempo.
- • Operate in a high-autonomy environment where technical decisions are driven by data, experimentation, and ownership of outcomes.
- • Deliver feature projects with rapid iteration cycles, typically completed within 1 to 4 weeks.
- • Contribute 10% of work time to personal projects or skill development aligned with professional growth.
- • Participate in annual company hackathons to innovate and strengthen team collaboration.
- • Maintain a remote-first, globally distributed workflow with asynchronous communication and flexible scheduling.
🎯 Requirements
- • 5+ years building real ML systems with demonstrated depth in inference or training infrastructure (not just model training notebooks)
- • Strong Python proficiency in production services, not scripting or prototyping
- • Hands-on experience with at least one of vLLM, SGLang, or TensorRT-LLM and a working understanding of inference engine behavior on GPU hardware
- • Fluency in quantization tradeoffs, including empirical measurement of quality regressions beyond compression ratios
- • Comfort with distributed systems: collective communication, sharding, and failure modes in multi-GPU and multi-node setups
- • Bias toward measurement: instrument before optimizing, distinguish real gains from benchmark artifacts
🏖️ Benefits
- • Competitive salary based on experience
- • Flexible, remote-first global work environment
- • Equity options
- • Learning budget for professional development, including access to international conferences and courses
- • 10% work time allocated to personal projects or self-improvement
- • Annual hackathon and team-building budget for company events
Skills & Technologies
About Castai Group Inc.
Castai Group Inc. provides specialized investment and strategic advisory services to middle-market companies across North America. The firm focuses on mergers and acquisitions, private placements, restructuring, and growth capital transactions for businesses in manufacturing, consumer goods, and business services sectors. Headquartered in New York, Castai operates a lean, senior-led model that emphasizes direct principal involvement, rigorous financial analysis, and long-term client partnerships. Its principals have executed more than 200 transactions totaling over $10 billion in aggregate value since inception.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities

Smile Digital Health Inc.
2 months ago


