Agent Server: Architecture & Deployment

Agent Server is the runtime component within LangSmith Deployment (formerly LangGraph Platform) that manages agent execution with persistent state, streaming, and human-in-the-loop. It runs LangGraph graphs as stateless containers backed by PostgreSQL and Redis, with framework-agnostic support for Google ADK, AWS Strands, and more via the Functional API. This article covers when to use it, how to deploy it correctly, how it fails, and how to choose between deployment modes and durability modes.

Quick Reference

→Agent Server = the runtime inside LangSmith Deployment; not a standalone rename from LangGraph Platform
→Core abstractions: Assistants (versioned agent configs), Threads (checkpointed conversations), Runs (single executions), Store (cross-thread KV)
→Architecture: stateless API containers + stateless queue workers + PostgreSQL (all state) + Redis (pub/sub + queue)
→Self-Hosted Lite is free up to 1M node executions/month; distributed mode requires enterprise-licensed images
→v0.8.0 (April 2026): Go runtime (core-api-grpc) is now default; Redis is the default queue backend
→Three durability modes: exit (checkpoint at completion only), async (after each step, may lose last step on crash), sync (before each step, no data loss)
→Split mode: API container sets N_JOBS_PER_WORKER=0; worker container overrides entrypoint to python -m langgraph_api.queue_entrypoint
→Non-LangGraph agents (Google ADK, Strands, CrewAI) deploy via @entrypoint from langgraph.func

When NOT to Use Agent Server

Agent Server is an opinionated managed runtime. It requires PostgreSQL, Redis, and either a free LangSmith account (for Self-Hosted Lite) or an enterprise license (for distributed mode). Before adopting it, verify your use case actually needs what it provides.

Use Agent Server when you need...	Skip it when...
Persistent conversation threads with checkpointed state across requests	Your agent completes in one request with no state to persist between calls
Human-in-the-loop (HITL) with the ability to resume interrupted runs	Your agent is fully automated with no interrupt/approval workflows
Cron-scheduled agent runs without building a scheduler yourself	You only need on-demand runs triggered by user requests
Multi-tenant deployments with isolated thread state per user	You have a single-tenant or single-graph deployment
The Assistants/Threads/Runs REST API and LangSmith tracing out of the box	You already have your own orchestration layer and just need graph execution

FastAPI + LangGraph is often enough

If your agent completes in under 30 seconds, has no cross-request state, and serves a single use case, a FastAPI container with LangGraph's compile() is simpler, cheaper, and has no licensing requirements. Agent Server adds operational complexity — make sure you're buying something you'll use.

Core Concepts

Concept	What It Is	REST Endpoint
Assistant	A versioned agent configuration (model, tools, prompt, graph)	POST /assistants
Thread	A conversation with checkpointed state — persists across runs	POST /threads
Run	A single execution of an assistant within a thread	POST /threads/{id}/runs
Cron	A scheduled recurring run (with or without a thread)	POST /crons
Store	Cross-thread persistent key-value storage for user preferences, memory, etc.	GET/PUT /store/items

Container Architecture

Stateless API servers + queue workers — all state in PostgreSQL and Redis

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.