TensorWave, Inc. logo

Staff Infrastructure Engineer - Virtualization

Job Overview

Location

Las Vegas, Nevada

Job Type

Full-time

Category

DevOps

Date Posted

May 16, 2026

Full Job Description

📋 Description

  • TensorWave is on a mission to deliver seamless, secure, reliable, and resilient AI compute at scale. We are building a versatile cloud platform designed to eliminate infrastructure barriers, allowing innovators to concentrate on their groundbreaking work rather than wrestling with underlying technology. Our goal is to ensure that breakthrough AI development moves at the speed of ideas, unhindered by infrastructure limitations.
  • As a Staff Infrastructure Engineer specializing in Virtualization, you will play a pivotal role in architecting and advancing our large-scale, high-performance infrastructure. This platform is engineered to support next-generation AI workloads, operating across multiple data centers and accommodating GPU-intensive environments with stringent demands for performance, isolation, and scalability. Your primary responsibility will be to lead the design and ongoing evolution of our virtualization platform, guiding its transition from traditional virtualization solutions to a more adaptable, cloud service provider (CSP)-aligned architecture. This will involve a deep dive into KVM/QEMU and modern Linux primitives, ensuring our infrastructure is robust and future-proof.
  • Your day-to-day responsibilities will be highly technical and hands-on, focusing on solving complex systems challenges at scale. You will be instrumental in designing and implementing a robust, scalable virtualization platform engineered to support high-density compute and GPU workloads. A key aspect of this role involves leading the strategic evolution of our existing platforms, such as Proxmox, towards a more advanced KVM/QEMU-based architecture. This transition will require you to define comprehensive standards for virtual machine (VM) lifecycle management, encompassing provisioning, scheduling, and migration processes. Furthermore, you will establish clear guidelines for performance isolation and resource allocation, as well as define strategies for failure domains and overall resilience.
  • A significant part of your work will focus on optimizing virtualization for high-performance workloads. This includes meticulous attention to NUMA alignment, precise CPU pinning and scheduling, ensuring PCIe topology awareness, and mastering GPU passthrough and device assignment techniques. You will collaborate closely with our networking and storage teams to seamlessly integrate high-throughput networking solutions, such as SR-IOV and RDMA, alongside distributed and local storage systems. A critical component of your role will be to build and continuously improve automation for hypervisor deployment and configuration, streamline image pipelines, and manage cluster scaling and lifecycle operations. You will also be tasked with troubleshooting deep system-level performance issues that span across compute, memory, storage, and network layers, ensuring the optimal functioning of our infrastructure. Your contributions will extend to shaping the long-term platform architecture and overall infrastructure strategy for TensorWave.
  • The team at TensorWave is dedicated to pushing the boundaries of AI infrastructure. We foster a collaborative environment where engineers are empowered to tackle challenging problems and drive innovation. You will be joining a group of passionate individuals committed to building a world-class AI compute platform.
  • In this role, you will have the unique opportunity to gain deep expertise in cutting-edge virtualization technologies and large-scale distributed systems. You will be at the forefront of designing and implementing infrastructure that powers the future of AI, developing advanced skills in KVM/QEMU, high-performance networking, and GPU acceleration. This position offers significant potential for professional growth and the chance to make a substantial impact on a rapidly evolving technology landscape.

🎯 Requirements

  • 7+ years of experience in infrastructure, systems engineering, or platform engineering.
  • Deep experience with Linux-based virtualization, including KVM/QEMU and libvirt or similar tooling.
  • Strong understanding of CPU scheduling and NUMA architectures, memory management and performance tuning, and storage I/O paths and performance characteristics.
  • Experience designing and operating virtualization platforms at scale (hundreds+ hosts).
  • Solid networking fundamentals, including Linux networking (bridges, bonding, VLANs) and high-performance networking concepts.
  • Experience with infrastructure automation tools such as Ansible, Terraform, or similar.
  • Strong troubleshooting skills across distributed systems.

🏖️ Benefits

  • Stock Options
  • 100% paid Medical, Dental, and Vision insurance for Employees
  • Company Health Savings Account Contributions
  • 100% paid Short Term and Long Term Disability Insurance for Employees
  • Flexible PTO
  • Paid Holidays

Skills & Technologies

Kubernetes
Terraform
Linux
DevOps
Senior
Onsite

Ready to Apply?

You will be redirected to an external site to apply.

TensorWave, Inc. logo
TensorWave, Inc.
Visit Website

About TensorWave, Inc.

TensorWave develops and operates an AI accelerator cloud built on AMD Instinct GPUs, targeting large-scale model training and inference. The platform offers on-demand and reserved compute with high-bandwidth memory, InfiniBand networking, and container orchestration, delivered through a web console and API. Designed for generative AI, LLM fine-tuning, and HPC workloads, the service emphasizes AMD performance at competitive pricing, supported by 24/7 operations teams. Based in Las Vegas, Nevada, the company serves hyperscalers, research labs, and enterprises needing GPU capacity beyond NVIDIA ecosystems.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

Web.com Group, Inc. logo

Web.com Group, Inc.

Argentina - Remote
Full-time
Expires Jul 14, 2026
Python
Docker
Kubernetes
+4 more

11 days ago

Apply
Expired
Bangalore, INDIA
Full-time
Expired May 16, 2026
Remote

2 months ago

Apply
Expired
Sydney, Australia
Full-time
Expired Apr 27, 2026
Remote

3 months ago

Apply
Expired
Melbourne, Australia
Full-time
Expired Apr 27, 2026
Remote

3 months ago

Apply