
Job Overview
Location
Las Vegas, Nevada
Job Type
Full-time
Category
Product Management
Date Posted
May 22, 2026
Full Job Description
đź“‹ Description
- • Own end-to-end program management for data center operations across multiple sites, covering hardware lifecycle, capacity planning, change management, incident response, and operational readiness
- • Serve as the primary coordination point across facilities, networking, hardware, and software teams to drive accountability to operational SLAs, program schedules, and customer commitments
- • Define and track program milestones, critical path dependencies, and resource requirements across concurrent multi-site operational programs
- • Translate operational status and risk into clear reporting for engineering, product, and executive leadership audiences
- • Identify, escalate, and mitigate risks to site reliability, capacity availability, and customer-facing uptime before they become incidents
- • Coordinate hardware deployment, node lifecycle, and maintenance sequencing across sites in alignment with capacity and customer commitments
- • Partner with network, power, facilities, and infrastructure engineering teams to drive operational readiness for high-density GPU compute clusters
- • Own post-incident retrospectives and corrective action tracking, driving lessons learned into durable process improvements
- • Maintain program documentation including operational runbooks, risk registers, vendor trackers, capacity plans, and change management records
- • Manage operational programs for AMD-powered AI clusters to ensure delivery on customer commitments at scale with zero tolerance for preventable downtime
- • Act as the operational spine connecting facilities engineers, network teams, hardware teams, DevOps, and SRE teams to ensure seamless execution in a high-growth environment
- • Ensure operational discipline and structured execution under pressure in fast-moving, complex infrastructure environments
- • Translate technical operational needs into clear communication for both field teams and executive stakeholders
- • Utilize Jira or equivalent program management tooling for milestone tracking, incident management, and cross-team coordination
- • Support the scaling of next-generation AI infrastructure by aligning data center operations with customer demands and infrastructure growth
- • Maintain compliance with operational standards and drive continuous reliability improvement across all data center sites
- • Coordinate with vendor teams and track infrastructure dependencies including power distribution, cooling, and structured cabling
- • Ensure operational readiness for high-density GPU compute deployments through proactive planning and cross-functional alignment
- • Track and report on capacity availability and hardware deployment timelines to support customer-facing uptime commitments
- • Lead incident response coordination and post-mortem analysis to institutionalize process improvements and prevent recurrence
- • Maintain accurate and up-to-date operational documentation including runbooks, risk registers, and change management records
- • Drive accountability for operational SLAs across multiple teams and sites in a high-visibility, high-impact role
🎯 Requirements
- • Bachelor’s degree in Computer Science, Electrical/Mechanical Engineering, Information Technology (or a related technical field or equivalent practical experience)
- • 5+ years of technical program management experience, with at least 3 years directly managing data center operations, infrastructure programs, or critical facilities at scale
- • Demonstrated experience coordinating across facilities, network, and hardware engineering teams in a live production environment
- • Familiarity with data center operational systems and infrastructure: power distribution, cooling, structured cabling, and physical layer dependencies
- • Proven track record managing complex, multi-site operational programs on compressed timelines in a high-growth environment
- • Strong technical communication skills: able to translate operational status and risk to both field teams and executive stakeholders
- • Experience with Jira or equivalent program management tooling for milestone tracking, incident management, and cross-team coordination
🏖️ Benefits
- • Stock Options
- • 100% paid Medical, Dental, and Vision insurance for Employees
- • Company Health Savings Account Contributions
- • 100% paid Short Term and Long Term Disability Insurance for Employees
- • Life and Voluntary Supplemental Insurance Options
- • Other Insurance Options, such as Pet & Legal Insurance
- • Various Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness Support
- • Flexible Spending Account
- • 401(k)
- • Employee Assistance Program
- • Flexible PTO
- • Paid Holidays
- • Parental Leave
- • Other In-Office Perks
Skills & Technologies
About TensorWave, Inc.
TensorWave develops and operates an AI accelerator cloud built on AMD Instinct GPUs, targeting large-scale model training and inference. The platform offers on-demand and reserved compute with high-bandwidth memory, InfiniBand networking, and container orchestration, delivered through a web console and API. Designed for generative AI, LLM fine-tuning, and HPC workloads, the service emphasizes AMD performance at competitive pricing, supported by 24/7 operations teams. Based in Las Vegas, Nevada, the company serves hyperscalers, research labs, and enterprises needing GPU capacity beyond NVIDIA ecosystems.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

