This job has expired

This position was posted on March 1, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Senior Machine Learning Engineer, Distributed vLLM (llm-d)

Red Hat, Inc.

Job Overview

Location

Boston

Job Type

Full-time

Full Job Description

📋 Description

• Join Red Hat's AI Inference Engineering team as a Senior Machine Learning Engineer, focusing on distributed vLLM infrastructure within the innovative llm-d project. This role is pivotal in accelerating AI for enterprises and simplifying Generative AI deployments by building a stable, scalable platform for LLM utilization.
• You will be at the forefront of AI innovation, tackling complex challenges in scalable inference systems and Kubernetes-native deployments. Your expertise in machine learning, distributed systems, high-performance computing, and cloud infrastructure will directly shape the future of AI deployment and adoption.
• Contribute significantly to the design, development, and rigorous testing of new features and advanced solutions for Red Hat AI Inference, ensuring cutting-edge capabilities.
• Drive innovation within the inference domain by actively participating in and contributing to upstream open-source communities, fostering collaboration and knowledge sharing.
• Architect, develop, and maintain robust distributed inference infrastructure. This includes leveraging Kubernetes APIs, custom operators, and the Gateway Inference Extension API to enable scalable and efficient Large Language Model (LLM) deployments.
• Develop and maintain critical system components using Go and/or Rust, ensuring seamless integration with the vLLM project and effective management of distributed inference workloads.
• Design, implement, and maintain sophisticated KV cache-aware routing and scoring algorithms. These algorithms are crucial for optimizing memory utilization and request distribution in large-scale, high-demand inference environments.
• Enhance the overall performance, resource utilization, fault tolerance, and stability of the inference stack, ensuring reliability and efficiency.
• Develop and rigorously test a variety of inference optimization algorithms to push the boundaries of AI model performance.
• Actively participate in technical design discussions, offering insightful perspectives and proposing innovative solutions to complex, high-impact project challenges.
• Foster a culture of continuous improvement by proactively sharing recommendations, best practices, and valuable technical knowledge with team members, promoting collective growth.
• Collaborate closely with product management, other engineering teams, and cross-functional stakeholders to analyze, clarify, and translate business requirements into actionable technical specifications.
• Communicate project progress, technical details, and key insights effectively to stakeholders and team members, ensuring clear visibility and alignment on development efforts.
• Provide mentorship and coaching to a distributed team of talented engineers, fostering their professional development and technical expertise.
• Conduct timely and constructive code reviews, upholding high standards of code quality, maintainability, and performance.
• Represent Red Hat AI (RHAI) in external engagements, including industry events, customer meetings, and open-source community forums, acting as a key technical ambassador.
• Work with cutting-edge technologies at the intersection of deep learning, distributed systems, and cloud-native infrastructure, contributing to the open-source ecosystem.
• Contribute to the development of advanced techniques for model quantization and sparsification, further optimizing LLM performance and efficiency.
• Ensure the scalability and reliability of AI inference solutions, meeting the demanding needs of enterprise clients.
• Engage with the open-source community to influence the direction of key projects like vLLM and contribute to the broader AI ecosystem.
• Develop and refine strategies for efficient resource management in distributed AI systems, particularly focusing on GPU utilization.
• Troubleshoot and resolve complex technical issues in production and development environments, ensuring minimal disruption and maximum uptime.
• Stay abreast of the latest advancements in machine learning, distributed systems, and cloud-native technologies, applying this knowledge to enhance Red Hat's offerings.
• Document technical designs, implementation details, and operational procedures to ensure knowledge transfer and maintainability.
• Contribute to the development of benchmarks and performance analysis tools to measure and improve inference efficiency.
• Collaborate on the integration of new hardware accelerators and optimization techniques into the inference platform.
• Drive the adoption of best practices in software development, testing, and deployment within the AI engineering team.
• Help define the technical roadmap for Red Hat's AI inference solutions, aligning with market trends and customer needs.

Skills & Technologies

Python

Rust

Kubernetes

Linux

Senior

Remote

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Red Hat, Inc.

Visit Website

About Red Hat, Inc.

Red Hat, Inc. is an American software company that provides enterprise open-source solutions, including its flagship Red Hat Enterprise Linux operating system, hybrid cloud platforms, container and Kubernetes technologies, middleware, storage, and automation tools. Founded in 1993 and headquartered in Raleigh, North Carolina, it became a subsidiary of IBM in 2019. The company supports organizations in modernizing and managing IT infrastructure through subscription-based support, training, and certification services, emphasizing security, scalability, and interoperability across hybrid and multicloud environments.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.