★ OverviewAdvanced17 min

Deployment Architectures

How to choose between serverless, containerized, and queue-worker deployment models for AI agents — with real cost math, load balancing for stateful agents, and a failure-mode breakdown for each model. Includes a decision gate for whether you should self-host at all.

Quick Reference

→Evaluate LangSmith Deployment before building self-hosted infrastructure — it eliminates 2–4 weeks of ops work for teams without existing DevOps capacity
→Serverless (Lambda, Cloud Functions) works for simple, short-lived agents but fails at the 15-min execution limit and has no persistent state
→Containerized (ECS, Cloud Run, Kubernetes) is the safest default — no timeout limits, horizontal scaling, full control over memory and concurrency
→Queue-backed workers (SQS, Redis Streams) decouple request ingestion from agent execution and scale each tier independently for high-throughput async workloads
→Lambda crosses Fargate in monthly cost at ~2,400 runs/day (30s avg execution, 2 GB, us-east-1) — compute the crossover for your workload before committing
→Use SSE for server-to-client streaming (simpler, HTTP-native) and WebSockets only when you need bidirectional real-time communication during agent execution
→Sticky sessions are a last resort for stateful agents — externalize state to Postgres so any replica can resume any thread

Before You Self-Host

Read this first

Before building deployment infrastructure, evaluate LangSmith Deployment (formerly LangGraph Platform). It provides persistent threads, HITL resumption, cron scheduling, and background runs without you writing any infrastructure code. If you skip this decision and the answer is 'you didn't need to self-host,' you've just spent 2–4 weeks on ops work you could have avoided.

Use LangSmith Deployment when...	Self-host when...
Your agent needs persistent threads across requests	You already operate containers and don't want another vendor
You need HITL with the ability to resume interrupted runs	Your compliance policy prohibits third-party state storage
You want cron scheduling and webhooks without building them	You need GPU inference or custom hardware
Your team has no DevOps capacity and wants a managed runtime	You're at a scale where LangSmith Cloud fees exceed infrastructure costs
You want browser-based debugging in LangSmith Studio	You need portability across clouds without vendor lock-in

If you decide to self-host, the rest of this article covers the three self-hosted deployment models in detail — serverless, containerized, and queue workers — with cost math, failure modes, and load balancing patterns for stateful agents. See the LangSmith Deployment article for the managed path.

Three Deployment Models

Three models

Self-hosted agents deploy as serverless functions, containers, or queue-backed workers. Each has distinct tradeoffs around latency, cost, and operational complexity.

Serverless Agents

Timeout trap

AWS Lambda caps execution at 15 minutes; Google Cloud Functions at 9 minutes. Multi-step agents with tool calls routinely exceed these limits. A graph that takes 12 minutes on average will fail on the tail of its distribution every time.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.