This job has expired

This position was posted on May 16, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Staff Software Engineer - Kernels

d-Matrix Corporation

Job Overview

Location

Santa Clara

Job Type

Full-time

Full Job Description

📋 Description

• Develop, enhance, and maintain software kernels for next-generation AI compute hardware, ensuring optimal performance and integration with proprietary AI architectures.
• Map computational graphs from AI frameworks (e.g., TensorFlow, PyTorch) to the underlying hardware, translating high-level operations into efficient low-level implementations.
• Collaborate with compiler experts to design and build compiler infrastructure that bridges high-level ML models with custom hardware instruction sets.
• Implement and optimize core machine learning operators including GEMMs, convolutions, BLAS, softmax, layer normalization, pooling, and SIMD-based operations for specialized accelerators.
• Leverage C/C++ and Python in Linux environments to develop high-performance, low-latency software components tailored for embedded SIMD vector processors like Tensilica.
• Work closely with hardware teams (mixed signal, DSP, CPU) to co-design hardware-software solutions, balancing trade-offs in power, throughput, memory bandwidth, and latency.
• Optimize software deliverables within tight development timelines, ensuring scalability and reliability across diverse AI workloads.
• Integrate and validate kernel implementations using industry-standard tools and debugging methodologies in embedded and accelerated computing environments.
• Participate in full-stack toolchain development, from algorithm design to deployment, ensuring end-to-end functionality across ML frameworks, compilers, and hardware targets.
• Contribute to the productization of the software stack by refining APIs, documentation, and testing protocols for internal and external adoption.
• Translate research-grade algorithms into production-ready code, addressing real-world constraints such as memory footprint, data movement, and hardware-specific quirks.
• Drive technical ownership of key software modules, mentoring junior engineers and leading technical design reviews with cross-functional teams.
• Maintain deep awareness of advancements in AI hardware, ML compilers, and accelerator technologies to continuously improve kernel efficiency and feature sets.
• Engage in daily collaboration with ML engineers, systems engineers, and hardware designers to align software capabilities with evolving hardware specifications and performance goals.

🎯 Requirements

• MS in computer engineering, math, physics, or related field with 5+ years of industry experience OR PhD in computer engineering, math, physics, or related field with 1+ years of industry experience
• Strong grasp of computer architecture, data structures, system software, and machine learning fundamentals
• Proficient in C/C++ and Python development in Linux environments using standard development tools
• Experience implementing algorithms for specialized hardware such as FPGAs, DSPs, GPUs, and AI accelerators using libraries like CUDA
• Experience implementing ML operators (GEMMs, convolutions, BLAS, SIMD ops like softmax, layer norm, pooling)
• Experience with embedded SIMD vector processors such as Tensilica

🏖️ Benefits

• Hybrid work model with 3+ days per week onsite at Santa Clara, CA headquarters
• Equal opportunity workplace with inclusive, collaborative culture
• Opportunity to work at the forefront of generative AI hardware and software innovation
• Direct communication environment valuing humility and team-driven execution

Skills & Technologies

Python

Linux

TensorFlow

PyTorch

Senior

Hybrid

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

d-Matrix Corporation

Visit Website

About d-Matrix Corporation

d-Matrix designs silicon for high-efficiency AI inference at scale. Its Corsair compute platform combines in-memory computing with a digital approach to slash latency and energy use in transformer and generative workloads. Targeting hyperscale data centers and edge deployments, the company offers hardware and software stacks that integrate into existing AI pipelines. Founded in 2019 and headquartered in Santa Clara, California, d-Matrix serves cloud and enterprise customers seeking cost-effective alternatives to GPUs for large language model serving.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.