Production & Scale/Infrastructure
★ OverviewAdvanced17 min

Deployment Architectures

How to choose between serverless, containerized, and queue-worker deployment models for AI agents — with real cost math, load balancing for stateful agents, and a failure-mode breakdown for each model. Includes a decision gate for whether you should self-host at all.

Quick Reference

  • Evaluate LangSmith Deployment before building self-hosted infrastructure — it eliminates 2–4 weeks of ops work for teams without existing DevOps capacity
  • Serverless (Lambda, Cloud Functions) works for simple, short-lived agents but fails at the 15-min execution limit and has no persistent state
  • Containerized (ECS, Cloud Run, Kubernetes) is the safest default — no timeout limits, horizontal scaling, full control over memory and concurrency
  • Queue-backed workers (SQS, Redis Streams) decouple request ingestion from agent execution and scale each tier independently for high-throughput async workloads
  • Lambda crosses Fargate in monthly cost at ~2,400 runs/day (30s avg execution, 2 GB, us-east-1) — compute the crossover for your workload before committing
  • Use SSE for server-to-client streaming (simpler, HTTP-native) and WebSockets only when you need bidirectional real-time communication during agent execution
  • Sticky sessions are a last resort for stateful agents — externalize state to Postgres so any replica can resume any thread

Before You Self-Host

Read this first

Before building deployment infrastructure, evaluate LangSmith Deployment (formerly LangGraph Platform). It provides persistent threads, HITL resumption, cron scheduling, and background runs without you writing any infrastructure code. If you skip this decision and the answer is 'you didn't need to self-host,' you've just spent 2–4 weeks on ops work you could have avoided.

Use LangSmith Deployment when...Self-host when...
Your agent needs persistent threads across requestsYou already operate containers and don't want another vendor
You need HITL with the ability to resume interrupted runsYour compliance policy prohibits third-party state storage
You want cron scheduling and webhooks without building themYou need GPU inference or custom hardware
Your team has no DevOps capacity and wants a managed runtimeYou're at a scale where LangSmith Cloud fees exceed infrastructure costs
You want browser-based debugging in LangSmith StudioYou need portability across clouds without vendor lock-in

If you decide to self-host, the rest of this article covers the three self-hosted deployment models in detail — serverless, containerized, and queue workers — with cost math, failure modes, and load balancing patterns for stateful agents. See the LangSmith Deployment article for the managed path.