This job has expired

This position was posted on May 16, 2026 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Software Engineer, Data Infrastructure

Cohere Inc.

Job Overview

Location

New York

Job Type

Full-time

Full Job Description

📋 Description

• Build and maintain petabyte-scale data storage infrastructure supporting AI training and evaluation workloads at the forefront of machine learning research.
• Design and implement high-performance data pipelines that transform unstructured data into optimized datasets across diverse storage backends including S3, GCS, and POSIX.
• Collaborate directly with top-tier researchers and engineers to solve complex networking and I/O performance challenges inherent in large-scale AI training environments.
• Develop and operate storage-related components on Kubernetes, including Persistent Volumes and CSI drivers, to ensure reliable, scalable data access for training jobs.
• Integrate and manage distributed data processing frameworks such as Apache Beam, Spark, or Flink to enable efficient data ingestion, preprocessing, and distribution across training clusters.
• Optimize data layer architecture to reduce latency, increase throughput, and ensure durability under extreme workloads generated by frontier AI model training.
• Work at the edge of current technical knowledge, creating novel solutions rather than optimizing existing systems to meet unprecedented demands in AI infrastructure.
• Maintain and enhance data infrastructure that serves as the backbone for semantic search, RAG, and agent-based AI systems powering customer-facing AI applications.
• Partner with modeling teams to understand their data requirements and translate them into scalable, production-grade infrastructure solutions.
• Monitor, troubleshoot, and improve data infrastructure reliability, observability, and efficiency in real-time production environments.
• Contribute to the design of data workflows that support iterative model evaluation, experimentation, and rapid feedback cycles for AI research teams.
• Ensure data infrastructure meets security, compliance, and operational standards while enabling maximum developer velocity and research independence.
• Stay deeply engaged with advancements in AI research and apply emerging insights to improve data infrastructure design and capabilities.
• Participate in on-call rotations and incident response for critical data systems to maintain uptime and performance for mission-critical training workloads.
• Document infrastructure patterns, operational procedures, and best practices to enable knowledge sharing across engineering and research teams.
• Advocate for infrastructure improvements that reduce friction, increase automation, and accelerate the pace of AI model development.
• Engage in cross-functional planning with product, research, and operations teams to align data infrastructure roadmap with organizational priorities.
• Embrace a culture of ownership, curiosity, and rapid iteration, where every team member is expected to push boundaries and solve problems others deem intractable.

🎯 Requirements

• 4+ years of experience working on data storage infrastructure
• Strong command of Python
• Kubernetes experience, especially on the storage side (Persistent Volumes, CSI drivers, etc.)
• The ability to transform unstructured data into performant datasets across diverse storage backends including S3, GCS, and POSIX
• Experience with distributed data processing frameworks such as Apache Beam, Spark, or Flink
• Genuine excitement about AI. You follow the research, have opinions, and enjoy being in the weeds

🏖️ Benefits

• An open and inclusive culture and work environment
• Work closely with a team on the cutting edge of AI research
• Weekly lunch stipend, in-office lunches & snacks
• Full health and dental benefits, including a separate budget to take care of your mental health
• 100% Parental Leave top-up for up to 6 months
• Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
• Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
• 6 weeks of vacation (30 working days!)

Skills & Technologies

Python

Kubernetes

Apache Spark

DevOps

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

AI Job Fit Analysis

Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

Cohere Inc.

Visit Website

About Cohere Inc.

Cohere provides large language models and retrieval-augmented generation APIs for enterprise developers to embed conversational AI, search, summarization, and content generation into applications. Founded in 2021 by former Google Brain researchers, the company offers cloud and on-premise deployment, fine-tuning tools, and multilingual support to help organizations automate workflows, improve customer support, and analyze unstructured data while maintaining data privacy and security controls.

View Company Profile

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.