WebSocket & SSE for Agents
How to choose between Server-Sent Events and WebSocket for AI agent communication, with production-ready FastAPI code using the native EventSourceResponse API, authentication patterns, backpressure handling, and scaling strategies.
Quick Reference
- →SSE (Server-Sent Events): one-way server-to-client streaming over HTTP. Default for LLM token streaming — auto-reconnects, works through CDNs, no special proxy configuration.
- →FastAPI 0.135.0+ ships native SSE via `EventSourceResponse` and `ServerSentEvent` from `fastapi.sse`. It automatically sets keep-alive pings, Cache-Control, and X-Accel-Buffering headers.
- →WebSocket is bidirectional — both sides can send at any time. Only use it when the client must send data mid-stream: cancellation, interactive voice, or real-time multi-user collaboration.
- →EventSource cannot send custom headers. You cannot pass an Authorization header. Use `fetch()` + `ReadableStream` for POST-based authenticated SSE, or short-lived query-param tokens.
- →HTTP/1.1 browsers limit SSE to 6 connections per domain. Multiple chat tabs exhaust the limit. Deploy behind HTTP/2 (which multiplexes streams) to avoid this.
- →SSE reconnection is built-in: the browser sends `Last-Event-ID` on reconnect. FastAPI reads it with `last_event_id: str | None = Header(None)`. Always include `id=` in your events.
- →WebSocket scaling requires sticky sessions and pub/sub (Redis or NATS) to fan events across server instances. Do not add this complexity for single-server deployments.
- →Backpressure: if the LLM generates faster than a slow client reads, server memory grows unbounded. Use `asyncio.Queue(maxsize=N)` to cap buffer size.
SSE or WebSocket? Decide in 30 Seconds
Most AI agent applications need to stream LLM responses to a browser — and that is a one-way server-to-client flow. Server-Sent Events (SSE) is designed exactly for this. It runs over standard HTTP, auto-reconnects on network drops, and works through every CDN and reverse proxy without special configuration. WebSocket provides a persistent bidirectional connection — but bidirectionality is a cost, not a feature, if you do not need it. Start with the question below and follow the branch.
Most LLM agents need SSE — only reach for WebSocket when clients must send data mid-stream
| Criterion | SSE | WebSocket |
|---|---|---|
| Direction | Server → client only | Bidirectional |
| Protocol | Standard HTTP (HTTP/2 multiplexes streams) | HTTP upgrade → custom framing |
| Reconnection | Built-in via Last-Event-ID header | Must implement manually |
| Proxy / CDN support | Works natively through most proxies and CDNs | Requires WebSocket-aware proxy (AWS ALB, Nginx, HAProxy) |
| Multiplexing | HTTP/2: multiple SSE streams on one TCP connection | Each WebSocket = a separate TCP connection |
| Use for agents | LLM token streaming, status updates, one-shot responses | Mid-stream cancellation, voice, real-time collaboration |
| Complexity | Low — formatted HTTP responses | Higher — connection lifecycle, heartbeats, message framing |
Most production LLM chat applications — including ChatGPT, Claude.ai, and Perplexity — use SSE for token streaming paired with a separate REST POST for sending user messages. True WebSocket is reserved for voice agents and real-time collaborative tools where the client genuinely needs to send data while the server is actively streaming.
Start with SSE for LLM streaming. It is simpler to implement, debug, and scale. Only switch to WebSocket when you have a concrete bidirectional requirement: mid-stream cancellation, interactive voice, or real-time multi-user collaboration. Many production chat apps use SSE for streaming and a separate REST endpoint for sending messages — and that hybrid is the right default.