★ OverviewAdvanced12 min
Deployment Architectures
Choosing between serverless, containerized, and long-running deployment models for AI agents. Load balancing stateful agents, WebSocket vs SSE for streaming, and self-hosted infrastructure patterns.
Quick Reference
- →Serverless (Lambda, Cloud Functions) works for simple, short-lived agents but struggles with long-running graphs and state
- →Containerized (ECS, Cloud Run, Kubernetes) gives you control over memory, concurrency, and persistent connections
- →Long-running workers with a queue (SQS, Redis) decouple request ingestion from agent execution for independent scaling
- →Use SSE for server-to-client streaming (simpler, HTTP-native) and WebSockets only when you need bidirectional real-time communication
- →Sticky sessions or state externalization are required for stateful agents behind a load balancer
Deployment Models Overview
Three models
Production agents generally deploy as serverless functions, containers, or long-running workers — each with distinct tradeoffs around latency, cost, and operational complexity.
Deployment models: serverless vs containerized vs long-running workers
Serverless is cheapest at low traffic but hits hard limits on execution time and memory. Containers give you full control but require capacity planning. Long-running workers with queues decouple ingestion from execution and scale each independently.
| Model | Cold Start | Max Runtime | State Management | Best For |
|---|---|---|---|---|
| Serverless | 100-500ms | 5-15 min | External only | Simple, short-lived agents |
| Containerized | 0ms (warm) | Unlimited | Local or external | Most production workloads |
| Long-running worker | 0ms (warm) | Unlimited | External (queue-backed) | High-throughput, async agents |