DataRobot, Inc. logo

Staff Software Engineer

Job Overview

Location

Remote

Job Type

Full-time

Category

Software Engineering

Date Posted

January 24, 2026

Full Job Description

đź“‹ Description

  • • As a Staff Software Engineer on the AI Compute team at DataRobot, you will be at the forefront of building and operating the foundational computing infrastructure that powers our cutting-edge AI products and the most demanding customer workloads. This is a critical role where you will act as a technical expert and a significant force multiplier, driving innovation and excellence within our engineering organization.
  • • Your primary mission will be to develop and maintain a robust, scalable, and efficient computing backbone. You will work backwards from the needs of data scientists, ML engineers, and application developers, ensuring they have the raw power and sophisticated orchestration necessary to train, deploy, and manage agentic AI at any scale. Think of yourself as the internal equivalent of a hyperscale cloud provider's core compute service, with an unwavering obsession for performance, efficiency, and enabling the future of AI.
  • • In this role, you will not only be a hands-on technical contributor, solving complex problems and shaping architectural decisions, but also a mentor, guiding and developing the careers of fellow engineers. You will influence cross-team roadmaps, champion pragmatic engineering practices, and lead by example in how we build, test, and operate our infrastructure software.
  • • A key responsibility will be to build and enhance systems that ensure our microservices are secure, performant, and reliable, enabling them to transition from an idea to production within an hour. This involves designing and implementing automated quality platforms that will elevate our release cadence from quarterly to weekly, daily, and eventually hourly, without compromising on performance, security, or reliability.
  • • You will architect and develop systems that continuously provide recommendations for right-sizing Kubernetes computing resources. This initiative is crucial for optimizing cloud spending for both DataRobot and our customers, ensuring maximum efficiency and cost-effectiveness.
  • • Collaboration will be central to your success. You will work closely with Product, Legal, and Security teams to guarantee that the continuous delivery processes you build are fully compliant and secure. You will also partner with architects and platform engineers across the R&D department to establish stringent continuous delivery and performance requirements for all production services.
  • • Engaging with internal product managers to define roadmaps, set milestones, and deliver innovative, user-friendly solutions to the many teams relying on our platform will be a regular part of your work. You will be instrumental in addressing continuous delivery and platform engineering challenges.
  • • You will play a vital role in ensuring our pipelines have clear, actionable playbooks and can operate seamlessly 24/7, minimizing the need for constant intervention. This requires a deep understanding of operational excellence and a commitment to defining and improving Service Level Agreements (SLAs) by working backward from the customer experience.
  • • This position involves participation in an on-call rotation, reflecting our belief in shared ownership of our platform and our commitment to building resilient, observable systems that require minimal manual intervention.
  • • You will leverage your expertise in Kubernetes architecture and operations, including resource management, scheduling, auto-scaling, Gateway API, Ingress, Prometheus, and OpenTelemetry, or demonstrate equivalent experience with other orchestrators like Nomad or Slurm.
  • • Experience with GPU clusters, either as a user or administrator, and a strong understanding of multi-node AI/ML environments are highly valued, as you'll be working with infrastructure that supports some of the most demanding AI workloads.
  • • A passion for developing products that empower fellow internal developers is essential. You should be driven by the desire to create tools and platforms that enhance the productivity and success of your colleagues.
  • • You will be expected to set technical direction, make critical architectural decisions, and effectively drive consensus among multiple teams and stakeholders, influencing others across the organization even without direct authority.
  • • A proven track record of successfully leading large-scale projects that impact dozens of teams and pods is a must. Your ability to deliver complex initiatives from inception to completion will be a key measure of success.
  • • Mentoring senior engineers, fostering a positive and collaborative team culture, and promoting continuous learning and improvement are integral aspects of this role. You will help shape the growth and development of our engineering talent.
  • • Your operational excellence will be crucial in continuously defining and improving SLAs, always working backward from the customer experience for all software components managed by the team.
  • • This role offers a unique opportunity to shape the future of AI infrastructure at DataRobot, working with a talented team dedicated to pushing the boundaries of what's possible in AI development and deployment.

Skills & Technologies

Python
Go
R
Node.js
AWS
Senior
Remote

Ready to Apply?

You will be redirected to an external site to apply.

DataRobot, Inc. logo
DataRobot, Inc.
Visit Website

About DataRobot, Inc.

DataRobot provides an enterprise AI platform that automates the end-to-end process for building, deploying, and managing machine-learning models at scale. Founded in 2012, the company serves industries including finance, healthcare, retail, and manufacturing, enabling data scientists and business analysts to create predictive analytics without deep coding expertise. The platform integrates with existing data infrastructure, offers governance and monitoring tools, and supports on-premise, cloud, and hybrid deployments. DataRobot emphasizes responsible AI practices and model interpretability to help organizations operationalize artificial intelligence while maintaining compliance and transparency.

Similar Opportunities

❌ EXPIRED
Scale to Win LLC logo

Scale to Win LLC

Remote
Full-time
Expired Jan 22, 2026
Senior
Remote

4 months ago

Apply
USA
Full-time
Expires May 2, 2026
Senior
Remote

5 days ago

Apply
Dandy Technology, Inc. logo

Dandy Technology, Inc.

USA
Full-time
Expires May 3, 2026
REST
Remote

4 days ago

Apply
Canada
Full-time
Expires May 2, 2026
Go
MongoDB
Redis
+3 more

5 days ago

Apply