This job has expired

This position was posted on May 16, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Software Engineer, Staff - SIMD Kernels

d-Matrix Corporation

Job Overview

Location

Santa Clara

Job Type

Full-time

Full Job Description

📋 Description

• Develop, enhance, and maintain high-performance software kernels for machine learning operators such as softmax, layer normalization, activation functions, GEMMs, convolutions, and pooling on next-generation AI compute hardware.
• Productize the software stack for d-Matrix’s AI compute engine by translating ML framework computational graphs into optimized hardware-specific implementations.
• Design and implement SIMD-based kernels tailored for specialized AI accelerators, ensuring maximum throughput and efficiency on proprietary hardware architectures.
• Collaborate closely with hardware teams to navigate hardware-software co-design trade-offs, optimizing algorithm mapping and resource utilization across compute pipelines.
• Build and integrate development tools and APIs that make the SDK intuitive, discoverable, and performant for external and internal ML developers.
• Analyze and benchmark kernel performance across different workloads and hardware configurations to identify bottlenecks and drive iterative improvements.
• Work across the full software toolchain—from low-level C/C++ kernel implementation to Python-based frontends and performance analysis tools—in a Linux development environment.
• Translate ML framework operations (from PyTorch or TensorFlow) into optimized, hardware-accelerated equivalents using libraries such as CUDA or similar parallel computing frameworks.
• Maintain and scale software quality in a fast-paced, agile development environment with high expectations for reliability, correctness, and performance.
• Lead technical initiatives within the SIMD Kernels team, providing mentorship and driving best practices in code structure, testing, and documentation.
• Contribute to the evolution of ML compiler integrations by aligning kernel design with compiler IRs such as MLIR, LLVM, TVM, or Glow.
• Optimize memory access patterns, data layout transformations, and vectorization strategies specifically for embedded SIMD vector processors and AI-focused architectures.
• Partner with ML researchers and application teams to understand real-world model requirements and adapt kernel implementations to support CV, NLP, and recommendation use cases.
• Ensure software solutions are portable, maintainable, and scalable across d-Matrix’s regional office locations and remote setups.
• Participate in technical reviews, code design discussions, and architecture planning sessions to align kernel development with overall product roadmaps.
• Document kernel behavior, performance characteristics, and usage guidelines for internal teams and external developers consuming the SDK.
• Proactively identify opportunities to improve developer experience, reduce latency, and increase throughput across the AI inference and training stack.
• Work independently with strong ownership, delivering high-quality code under tight deadlines while maintaining alignment with team and organizational goals.

🎯 Requirements

• MS or PhD in computer engineering, math, physics, or a related degree with 5+ years of industry experience
• Strong grasp of computer architecture, data structures, system software, and machine learning fundamentals
• Proficient in C/C++ and Python development in Linux environment using standard development tools
• Experience implementing algorithms in C/C++ and Python for specialized hardware such as FPGAs, DSPs, GPUs, or AI accelerators using libraries like CUDA
• Experience implementing ML operators including GEMMs, convolutions, softmax, layer normalization, and pooling
• Self-motivated team player with strong sense of ownership and leadership

🏖️ Benefits

• Remote work possible alongside Santa Clara HQ or regional office options
• Equal opportunity workplace with inclusive, collaborative culture
• Opportunity to work on cutting-edge AI hardware and software at the forefront of generative AI innovation
• Exposure to ML compilers (MLIR, LLVM, TVM) and modern AI frameworks (PyTorch, TensorFlow)
• Work in a startup environment with high impact and rapid iteration
• Access to technical mentors and cross-functional teams with deep hardware-software expertise

Skills & Technologies

Python

Linux

TensorFlow

PyTorch

Senior

Remote

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

d-Matrix Corporation

Visit Website

About d-Matrix Corporation

d-Matrix designs silicon for high-efficiency AI inference at scale. Its Corsair compute platform combines in-memory computing with a digital approach to slash latency and energy use in transformer and generative workloads. Targeting hyperscale data centers and edge deployments, the company offers hardware and software stacks that integrate into existing AI pipelines. Founded in 2019 and headquartered in Santa Clara, California, d-Matrix serves cloud and enterprise customers seeking cost-effective alternatives to GPUs for large language model serving.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.