Production & Scale/Production Operations
Advanced20 min

Agent Server: Architecture & Deployment

Agent Server is the runtime component within LangSmith Deployment (formerly LangGraph Platform) that manages agent execution with persistent state, streaming, and human-in-the-loop. It runs LangGraph graphs as stateless containers backed by PostgreSQL and Redis, with framework-agnostic support for Google ADK, AWS Strands, and more via the Functional API. This article covers when to use it, how to deploy it correctly, how it fails, and how to choose between deployment modes and durability modes.

Quick Reference

  • Agent Server = the runtime inside LangSmith Deployment; not a standalone rename from LangGraph Platform
  • Core abstractions: Assistants (versioned agent configs), Threads (checkpointed conversations), Runs (single executions), Store (cross-thread KV)
  • Architecture: stateless API containers + stateless queue workers + PostgreSQL (all state) + Redis (pub/sub + queue)
  • Self-Hosted Lite is free up to 1M node executions/month; distributed mode requires enterprise-licensed images
  • v0.8.0 (April 2026): Go runtime (core-api-grpc) is now default; Redis is the default queue backend
  • Three durability modes: exit (checkpoint at completion only), async (after each step, may lose last step on crash), sync (before each step, no data loss)
  • Split mode: API container sets N_JOBS_PER_WORKER=0; worker container overrides entrypoint to python -m langgraph_api.queue_entrypoint
  • Non-LangGraph agents (Google ADK, Strands, CrewAI) deploy via @entrypoint from langgraph.func

When NOT to Use Agent Server

Agent Server is an opinionated managed runtime. It requires PostgreSQL, Redis, and either a free LangSmith account (for Self-Hosted Lite) or an enterprise license (for distributed mode). Before adopting it, verify your use case actually needs what it provides.

Use Agent Server when you need...Skip it when...
Persistent conversation threads with checkpointed state across requestsYour agent completes in one request with no state to persist between calls
Human-in-the-loop (HITL) with the ability to resume interrupted runsYour agent is fully automated with no interrupt/approval workflows
Cron-scheduled agent runs without building a scheduler yourselfYou only need on-demand runs triggered by user requests
Multi-tenant deployments with isolated thread state per userYou have a single-tenant or single-graph deployment
The Assistants/Threads/Runs REST API and LangSmith tracing out of the boxYou already have your own orchestration layer and just need graph execution
FastAPI + LangGraph is often enough

If your agent completes in under 30 seconds, has no cross-request state, and serves a single use case, a FastAPI container with LangGraph's compile() is simpler, cheaper, and has no licensing requirements. Agent Server adds operational complexity — make sure you're buying something you'll use.