This job has expired

This position was posted on May 16, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Principal AI/ML System Software Engineer

d-Matrix Corporation

Job Overview

Location

Santa Clara

Job Type

Full-time

Full Job Description

📋 Description

• Lead the development, enhancement, and maintenance of next-generation AI deployment software for d-Matrix’s proprietary AI compute engine, ensuring seamless hardware-software co-design optimization.
• Architect and scale full-stack toolchain components that enable efficient deployment of large language models (LLMs), vision-language models (VLMs), and other deep learning workloads across distributed systems.
• Collaborate closely with system software, machine learning, compiler, and hardware engineering teams to integrate and optimize AI inference pipelines from model definition to production deployment.
• Design and implement high-performance, low-latency software infrastructure using C/C++ and Python in Linux environments, with deep attention to system-level efficiency and resource utilization.
• Build and maintain distributed inference serving systems leveraging industry-standard frameworks such as TensorRT-LLM, vLLM, SGLang, ONNX Runtime, and TensorRT.
• Implement and optimize distributed communication primitives using collectives like NCCL and OpenMPI to maximize throughput and minimize latency in multi-node AI training and inference deployments.
• Integrate and extend MLOps tooling including Kubernetes, Ray, or similar platforms to automate model lifecycle management, scaling, monitoring, and resource orchestration.
• Define and enforce software testing methodologies for ML workloads, ensuring reliability, reproducibility, and performance stability across diverse hardware configurations.
• Drive end-to-end ownership of software deliverables in fast-paced, agile development cycles with tight deadlines, balancing innovation with production-grade robustness.
• Mentor and guide junior software engineers and cross-functional team members, fostering a culture of technical excellence, direct communication, and collaborative problem-solving.
• Contribute to the productization of the AI software stack by translating complex hardware capabilities into intuitive, scalable, and maintainable software interfaces for internal and external users.
• Analyze and resolve performance bottlenecks across the full stack — from neural network layers to kernel optimizations and system-level I/O — to achieve maximal AI compute utilization.
• Stay at the forefront of AI infrastructure trends, evaluating emerging frameworks and tools to continuously improve d-Matrix’s software platform and maintain competitive advantage.
• Communicate technical trade-offs clearly to cross-functional teams and stakeholders, ensuring alignment on architecture decisions that impact performance, scalability, and time-to-market.
• Participate in technical design reviews, code walkthroughs, and system-level debugging sessions to uphold code quality and system integrity across the AI software stack.
• Document architecture decisions, APIs, and operational procedures to enable knowledge transfer and ensure long-term maintainability of deployed systems.
• Champion a culture of humility, ownership, and execution-driven innovation within the software team, aligning with d-Matrix’s values of direct communication and inclusive collaboration.

🎯 Requirements

• BS in Computer Science, Engineering, Math, Physics, or related field with 12+ years of industry software development experience; MS preferred with 6+ years
• Strong grasp of system software, data structures, computer architecture, and machine learning fundamentals
• Proficient in C/C++/Python development in Linux environments using standard development tools
• Experience with distributed, high-performance software design and implementation
• Self-motivated team player with a strong sense of ownership and leadership
• Experience deploying ML workloads (LLMs, VLMs, NLP, etc.) on distributed systems

🏖️ Benefits

• Hybrid work model with 3 days per week onsite at Santa Clara, CA headquarters
• Equal opportunity workplace with inclusive, empowering culture
• Opportunity to work at the forefront of generative AI hardware-software co-design
• Collaborative environment with direct communication and mutual respect
• Exposure to cutting-edge AI compute technology and next-generation deployment systems
• No external agency submissions — direct hiring only for consistent candidate evaluation

Skills & Technologies

Python

Kubernetes

Linux

TensorFlow

PyTorch

Senior

Hybrid

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

d-Matrix Corporation

Visit Website

About d-Matrix Corporation

d-Matrix designs silicon for high-efficiency AI inference at scale. Its Corsair compute platform combines in-memory computing with a digital approach to slash latency and energy use in transformer and generative workloads. Targeting hyperscale data centers and edge deployments, the company offers hardware and software stacks that integrate into existing AI pipelines. Founded in 2019 and headquartered in Santa Clara, California, d-Matrix serves cloud and enterprise customers seeking cost-effective alternatives to GPUs for large language model serving.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.