
Job Overview
Location
Remote
Job Type
Full-time
Category
Software Engineering
Date Posted
February 26, 2026
Full Job Description
đź“‹ Description
- • Join LangChain, Inc. as a Senior Backend Engineer, focusing on LangSmith Deployments, the cutting-edge infrastructure designed to run AI agents reliably and at scale. Our mission is to make intelligent agents ubiquitous, and you will be instrumental in building the robust runtime that powers this vision. LangChain's open-source frameworks, LangChain and LangGraph, are downloaded over 90 million times monthly, demonstrating the immense developer community and the impact of our work. LangSmith, our platform for observability, evaluation, and deployment, is crucial for transforming LLM systems into dependable production experiences. We are trusted by millions of developers and power AI teams at leading companies like Replit, Clay, Cloudflare, Harvey, Rippling, Vanta, and Workday.
- • In this role, you will be at the forefront of building purpose-built infrastructure for AI agents. Unlike traditional web applications, AI agents operate differently: they run for extended durations, engage in asynchronous collaboration with both humans and other agents, and must gracefully handle failures mid-execution. LangSmith Deployments is the runtime engineered to address these unique challenges, providing durable checkpointing, fault-tolerant orchestration, and horizontal scaling capabilities. This system is deployed across both cloud and self-hosted environments, demanding a high degree of reliability and performance.
- • Your core responsibilities will involve designing and implementing sophisticated distributed queue and worker systems. These systems will be responsible for managing concurrent agent execution, processing background tasks efficiently, and orchestrating multi-agent coordination within a horizontally scalable infrastructure. You will tackle complex challenges related to concurrency, task distribution, and ensuring smooth operation under heavy load.
- • You will own and evolve our core data infrastructure. This includes ensuring robust state persistence for agents, implementing atomic job claiming mechanisms to prevent race conditions, managing persistent connections, and handling schema evolution as our platform grows. The integrity and performance of this data layer are paramount to the success of our deployments.
- • Collaboration is key. You will actively participate in architectural discussions, contributing to decisions that ensure our solutions are not only scalable but also highly robust and maintainable. You'll work closely with a talented team to shape the future of AI agent infrastructure.
- • A significant part of your work will involve creating and maintaining resumable streaming infrastructure. This feature is critical for allowing clients to disconnect and reconnect mid-execution without losing any state, providing a seamless and resilient user experience even in unstable network conditions.
- • You will be responsible for instrumenting and monitoring our production systems. This includes implementing comprehensive tracing, collecting key metrics, and setting up effective alerting to ensure the health and performance of the platform. Proactive monitoring and rapid issue detection are essential.
- • As part of a dedicated team, you will participate in on-call rotations and take ownership of incident response for the runtime. This involves diagnosing, resolving, and documenting issues that arise in production, ensuring minimal disruption to our users.
- • You will also contribute to creating and maintaining essential technical documentation, including detailed system designs and operational runbooks. Clear and comprehensive documentation is vital for team alignment and operational efficiency.
- • Furthermore, you will have the opportunity to contribute to and extend our open-source LangGraph project. This project is already utilized by thousands of developers to build sophisticated agent applications, and your contributions will directly impact the broader AI development community.
- • To succeed in this role, you should possess a strong backend engineering background with a proven track record of building and scaling complex systems. Your ability to think critically about distributed systems, data management, and operational excellence will be crucial. You will thrive in a fast-paced environment where innovation and collaboration drive our success.
Skills & Technologies
Python
Go
Kubernetes
Terraform
Backend
Senior
Remote
About LangChain, Inc.
LangChain, Inc. provides open-source software libraries and cloud services for building applications that integrate large language models with external data sources and workflows. Its tools help developers create retrieval-augmented generation systems, manage prompts, chain model calls, and monitor performance in production environments. The company was founded in 2023 and is headquartered in San Francisco, California.


