Fundamental logo

Model Serving Engineer

Job Overview

Location

Europe

Job Type

Full-time

Category

Backend Engineer

Date Posted

April 3, 2026

Full Job Description

đź“‹ Description

  • • As a Model Serving Engineer at Fundamental, you will own the production inference layer for NEXUS, the company’s Large Tabular Model (LTM), ensuring that cutting-edge AI research translates into reliable, high-performance model serving for Fortune 100 enterprises. This role is critical to unlocking the value of enterprise decision-making by enabling scalable, low-latency inference at production scale.
  • • You will design, build, and maintain the core infrastructure that serves NEXUS models using Triton Inference Server, directly impacting how quickly and accurately global enterprises can act on AI-driven insights.
  • • Day to day, you will implement and optimize inference pipelines including custom Python backends, dynamic batching strategies, and model ensemble configurations in Triton, ensuring efficient resource utilization under varying workloads.
  • • You will profile and tune Python inference code for performance, focusing on GIL contention, multi-threading, and concurrency patterns to maximize throughput and minimize latency in GPU-accelerated environments.
  • • You will collaborate closely with research engineers to understand new model architectures at a computational level, translating innovations in dynamic shapes, memory access patterns, and batching behavior into production-ready serving solutions.
  • • You will own observability and control loops for production inference, instrumenting GPU memory, CPU utilization, batch queue depth, and latency metrics, then actively tuning instance groups, concurrency limits, and batching policies in response to real-time telemetry.
  • • You will evaluate and integrate emerging inference frameworks (e.g., vLLM, TorchServe, ONNX Runtime) as the model ecosystem evolves, ensuring Fundamental’s serving stack remains at the forefront of efficiency and flexibility.
  • • You will contribute to GPU utilization improvements and resource efficiency across the serving fleet, driving cost-effective scaling without compromising performance or reliability.
  • • You will work within a mission-driven, low-ego culture that values technical excellence, ownership, and bias toward action, collaborating with DeepMind alumni and world-class engineers to define the future of enterprise AI.
  • • In this role, you will deepen your expertise in ML infrastructure at the intersection of systems engineering and applied ML, gaining rare experience serving large-scale tabular models in production and shaping the architecture of next-generation AI infrastructure.

🎯 Requirements

  • • Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent practical experience)
  • • 5+ years of experience in model serving, ML infrastructure, or a closely related backend engineering role
  • • Deep, production-level experience with Triton Inference Server, including custom Python backends, batching configuration, and model repository management
  • • Expert-level Python skills with a thorough understanding of the GIL, multi-threading, multiprocessing, and async concurrency patterns
  • • Strong understanding of neural network inference mechanics, forward passes, batching strategies, memory management, and numerical precision tradeoffs
  • • Hands-on experience with other inference frameworks (TorchServe, TensorFlow Serving, ONNX Runtime, vLLM, etc.) and the ability to evaluate tradeoffs between them

🏖️ Benefits

  • • Competitive compensation with salary and equity
  • • Comprehensive health coverage, including medical, dental, vision, and 401K
  • • Paid parental leave for all new parents, inclusive of adoptive and surrogate journeys
  • • Relocation support for employees moving to join the team in one of our office locations
  • • A mission-driven, low-ego culture that values diversity of thought, ownership, and bias toward action

Skills & Technologies

Python
Kubernetes
TensorFlow
Prometheus
Grafana
Onsite
Degree Required

Ready to Apply?

You will be redirected to an external site to apply.

Fundamental logo
Fundamental
Visit Website

About Fundamental

Fundamental is a company focused on providing innovative solutions and services. They aim to empower businesses by leveraging cutting-edge technology and expert insights. Their offerings span various sectors, addressing complex challenges with tailored approaches. The company is committed to driving growth and efficiency for its clients through a blend of strategic planning and practical execution. With a strong emphasis on research and development, Fundamental continuously seeks to advance its capabilities and deliver value-added services. Their client-centric model ensures that solutions are aligned with specific business objectives, fostering long-term partnerships and mutual success. Fundamental strives to be a reliable partner in navigating the evolving business landscape.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

Yerevan, Armenia
Full-time
Expires Jun 4, 2026
Go
Rust
Ruby
+5 more

17 days ago

Apply
Argentina - Remote
Full-time
Expires Jun 21, 2026
TypeScript
Scala
React
+4 more

4 hours ago

Apply
Argentina
Full-time
Expires May 12, 2026
Java
Remote

1 month ago

Apply
Argentina
Full-time
Expires May 20, 2026
JavaScript
TypeScript
React
+5 more

1 month ago

Apply