This job has expired

This position was posted on February 24, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Senior AI/ML Specialist Solutions Architect (AI Infra & Cloud)

Lavendo Inc.

Job Overview

Location

San Francisco

Job Type

Full-time

Full Job Description

📋 Description

• Join a pioneering, publicly traded company at the vanguard of the AI revolution, offering a cutting-edge, AI-centric cloud platform that is fundamentally transforming the artificial intelligence landscape. Our client provides unparalleled infrastructure, encompassing vast-scale GPU clusters, sophisticated cloud platforms, and a comprehensive suite of tools and services meticulously designed for developers. This enables them to cater to the explosive growth of the global AI industry, serving Fortune 1000 enterprises, leading innovative startups, and distinguished AI researchers.
• Our mission is to democratize access to essential AI infrastructure, empowering organizations of all sizes to create, optimize, and deploy advanced AI solutions at any scale. We are committed to demystifying the inherent complexities of AI development by delivering a robust full-stack AI platform that seamlessly integrates powerful hardware with intuitive, user-friendly tools and services. This role presents an exceptional opportunity to architect and implement highly scalable AI solutions for a discerning clientele, leveraging state-of-the-art technologies and contributing directly to one of the most potent commercially available supercomputers in existence.
• As a Senior AI/ML Specialist Solutions Architect, you will be instrumental in designing and optimizing distributed training and inference systems tailored for large-scale AI models. This involves architecting robust, high-performance solutions that can handle the immense computational demands of modern AI workloads, ensuring efficiency and scalability from the ground up.
• You will be responsible for designing and delivering customer-focused solutions that not only meet technical requirements but also maximize tangible business value. This requires a deep understanding of customer needs, business objectives, and the capabilities of our client's platform to translate complex technical concepts into actionable, value-driven outcomes.
• A key aspect of this role involves leading the critical transition of Machine Learning (ML) pipelines from initial Proof of Concept (POC) stages to fully scalable, production-ready systems. This entails establishing best practices, implementing robust monitoring and management strategies, and ensuring the reliability and performance of ML models in live environments.
• You will cultivate and nurture long-term customer relationships, acting as a trusted technical advisor. Ensuring customer satisfaction and maintaining strategic alignment with their evolving goals will be paramount, fostering partnerships built on trust and mutual success.
• To disseminate knowledge and best practices, you will create insightful whitepapers, deliver compelling technical presentations at industry events, and host informative webinars. This thought leadership contributes to the broader AI community and positions our client as an authority in the field.
• Provide crucial technical leadership and mentorship to internal teams, guiding them on best practices for AI infrastructure, deployment strategies, and optimization techniques. Sharing your expertise will elevate the team's capabilities and foster a culture of continuous learning and improvement.
• Act as a vital bridge between customer needs and our internal development efforts. You will collaborate closely with engineering and product teams, channeling customer feedback to influence and prioritize product roadmaps, ensuring our offerings remain at the forefront of industry innovation.
• This role demands a proactive approach to problem-solving, a passion for cutting-edge AI technologies, and the ability to thrive in a dynamic, fast-paced environment. You will be at the forefront of shaping the future of AI infrastructure and its application across diverse industries.

🎯 Requirements

• Minimum of 5 years of progressive experience in cloud technologies and infrastructure, with a strong preference for senior MLOps or Solutions Architect roles focused on AI/ML workloads.
• Proven expertise in scaling and optimizing AI workloads across multi-node and multi-GPU environments, demonstrating a deep understanding of distributed computing principles and hardware acceleration.
• Demonstrated success in delivering end-to-end ML products, with a clear track record of scaling solutions from Proof of Concept (POC) to robust production systems.
• Deep, hands-on knowledge of major ML frameworks such as PyTorch and JAX, including their architecture, performance characteristics, and deployment considerations.
• Strong background and practical experience within the NVIDIA HPC ecosystem, including proficiency with CUDA, NCCL, and Infiniband technologies for high-performance computing.
• Exceptional communication and interpersonal skills, with the ability to effectively engage and influence both highly technical engineering teams and non-technical business stakeholders at all levels.
• Legal authorization to work in the United States on a full-time basis without requiring future sponsorship.

🏖️ Benefits

• Competitive annual compensation ranging from $225,000 to $315,000, with flexibility based on candidate experience and geographic location.
• Comprehensive medical benefits package, featuring 100% company-paid coverage for medical, dental, and vision insurance for employees and their dependents.
• Generous 401(k) retirement savings plan, including a 4% company match program to support long-term financial security.
• Attractive stock options plan, offering employees an opportunity to share in the company's success and growth.
• Flexible remote work environment, providing autonomy and work-life balance for employees across the U.S.
• Company-paid insurance coverage, including short-term disability, long-term disability, and life insurance, offering peace of mind and financial protection.
• Substantial paid parental leave: 20 weeks for primary caregivers and 12 weeks for secondary caregivers, supporting new parents during this significant life event.
• Monthly stipend of up to $85 to cover mobile and internet expenses, ensuring seamless connectivity for remote work.
• Opportunity to work with state-of-the-art AI and cloud technologies, including access to the latest NVIDIA GPUs and advanced infrastructure.
• Be an integral part of a team operating one of the most powerful commercially available supercomputers, contributing to groundbreaking AI advancements.
• Engage with a company committed to sustainable AI infrastructure, utilizing energy-efficient data centers that implement innovative waste heat recovery systems to warm nearby residential buildings.

Skills & Technologies

Python

Java

Node.js

Docker

Kubernetes

Senior

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Lavendo Inc.

Visit Website

About Lavendo Inc.

Lavendo is a sales‑specialist recruiting firm that connects startups and scaling companies with U.S.‑based sales talent—providing sourcing, pre‑vetting, candidate matching, and hiring support across roles from SDR to Head of Sales to accelerate go‑to‑market hiring.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.