
Job Overview
Location
Remote - United States
Job Type
Full-time
Category
DevOps & SysAdmin
Date Posted
April 14, 2026
Full Job Description
đź“‹ Description
- • As a Site Reliability Engineer in the Hardware Infrastructure team at Nebius Group N.V., you will play a critical role in ensuring the fault-tolerance, scalability, and uninterrupted operation of services that power next-generation AI infrastructure. This role is essential to maintaining the reliability of data center systems that support global AI workloads, directly enabling customers to innovate without managing complex infrastructure.
- • Day to day, you will design, develop, and support systems across the data center lifecycle, including monitoring engineering equipment (power, cooling), tracking IT assets (servers, racks, network devices), managing hardware repair tasks, and overseeing server production. You will implement and improve CI/CD processes, troubleshoot complex hardware, software, and networking issues, and use cutting-edge technology to solve infrastructure challenges while ensuring high availability and performance.
- • Nebius is a global leader in AI-focused cloud computing, headquartered in Amsterdam and listed on Nasdaq, with R&D hubs across Europe, North America, and Israel. The company employs over 1,400 people, including more than 400 skilled engineers, and combines deep hardware and software expertise with an in-house AI R&D team to deliver scalable, cost-effective cloud solutions for the AI economy.
- • In this role, you will gain hands-on experience with large-scale distributed systems, advance your skills in Linux automation using Python and Bash, and contribute to mission-critical infrastructure that supports AI innovation. You will collaborate with globally distributed teams, solve real-world engineering problems, and grow professionally within a fast-moving, innovative organization at the forefront of AI infrastructure.
Skills & Technologies
About Nebius Group N.V.
Nebius Group N.V. is a Netherlands-based technology company that operates a full-stack cloud platform designed for AI and machine learning workloads. It provides scalable GPU and CPU infrastructure, managed Kubernetes, object storage, and specialized AI services to enterprises and research organizations worldwide. The company was formed from the restructuring of Yandex N.V. and continues to serve global markets with data centers across Europe and North America.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities
16 days ago

Pragmatike Soluciones TecnolĂłgicas S.L.
15 days ago

