Principal Software Engineer - Kernels

d-Matrix Corporation

Job Overview

Location

Santa Clara

Job Type

Full-time

Full Job Description

📋 Description

• Lead the development, enhancement, and maintenance of software kernels for d-Matrix’s next-generation AI compute engine, ensuring optimal performance on specialized hardware architectures.
• Collaborate with compiler experts to design and build scalable compiler infrastructure that translates computational graphs from AI frameworks into efficient hardware instructions.
• Map complex machine learning algorithms—such as GEMMs, convolutions, BLAS, softmax, layer normalization, and pooling—to underlying hardware with precision, leveraging deep knowledge of SIMD, DSP, and AI accelerator architectures.
• Optimize software deliverables within tight development cycles by balancing hardware-software co-design trade-offs across the full-stack toolchain, from low-level kernels to high-level framework integration.
• Work directly with hardware teams (mixed signal, DSP, CPU) to align software capabilities with physical hardware constraints and performance targets.
• Implement high-performance operators using C/C++ and Python in Linux environments, with proven experience in CUDA and similar acceleration libraries.
• Develop and maintain kernels for embedded SIMD vector processors such as Tensilica, ensuring compatibility and efficiency across diverse AI workloads.
• Translate computational graphs generated by ML frameworks like TensorFlow and PyTorch into optimized hardware-executable code, minimizing latency and maximizing throughput.
• Drive ownership of software components from concept through production deployment, demonstrating leadership in a fast-paced, collaborative engineering environment.
• Contribute to the productization of the AI software stack, ensuring reliability, scalability, and maintainability for commercial deployment.
• Engage in cross-functional technical discussions with ML engineers, systems engineers, and hardware designers to align software strategy with overall product goals.
• Maintain rigorous code quality standards and participate in code reviews, documentation, and testing processes to ensure robustness of kernel-level software.
• Stay current with advancements in AI hardware, ML compilers, and low-level optimization techniques to continuously improve the software stack’s performance and capabilities.
• Mentor junior engineers and foster a culture of technical excellence, direct communication, and collaborative problem-solving within the software team.

🎯 Requirements

• MS in computer engineering, math, physics, or related field with 12+ years of industry experience OR PhD in computer engineering, math, physics, or related field with 7+ years of industry experience
• Strong grasp of computer architecture, data structures, system software, and machine learning fundamentals
• Proficient in C/C++ and Python development in Linux environments using standard development tools
• Experience implementing ML operators (GEMMs, convolutions, BLAS, SIMD operations) on specialized hardware such as FPGAs, DSPs, GPUs, and AI accelerators
• Experience with embedded SIMD vector processors such as Tensilica
• Self-motivated team player with strong sense of ownership and leadership

🏖️ Benefits

• Opportunity to work at the forefront of generative AI hardware and software innovation
• Collaborative, inclusive team culture emphasizing humility, direct communication, and mutual respect
• Hybrid work model with 3–5 days per week onsite at Santa Clara, CA headquarters
• Equal opportunity workplace with affirmative action commitment to diversity and inclusion
• Exposure to cutting-edge AI compute technology and next-generation hardware architectures
• Direct impact on productizing a commercial AI compute platform

Skills & Technologies

Python

Linux

TensorFlow

PyTorch

Senior

Hybrid

Degree Required

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

d-Matrix Corporation

Visit Website

About d-Matrix Corporation

d-Matrix designs silicon for high-efficiency AI inference at scale. Its Corsair compute platform combines in-memory computing with a digital approach to slash latency and energy use in transformer and generative workloads. Targeting hyperscale data centers and edge deployments, the company offers hardware and software stacks that integrate into existing AI pipelines. Founded in 2019 and headquartered in Santa Clara, California, d-Matrix serves cloud and enterprise customers seeking cost-effective alternatives to GPUs for large language model serving.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.