Integrations/Real-Time AI
Intermediate18 min

WebSocket & SSE for Agents

How to choose between Server-Sent Events and WebSocket for AI agent communication, with production-ready FastAPI code using the native EventSourceResponse API, authentication patterns, backpressure handling, and scaling strategies.

Quick Reference

  • SSE (Server-Sent Events): one-way server-to-client streaming over HTTP. Default for LLM token streaming — auto-reconnects, works through CDNs, no special proxy configuration.
  • FastAPI 0.135.0+ ships native SSE via `EventSourceResponse` and `ServerSentEvent` from `fastapi.sse`. It automatically sets keep-alive pings, Cache-Control, and X-Accel-Buffering headers.
  • WebSocket is bidirectional — both sides can send at any time. Only use it when the client must send data mid-stream: cancellation, interactive voice, or real-time multi-user collaboration.
  • EventSource cannot send custom headers. You cannot pass an Authorization header. Use `fetch()` + `ReadableStream` for POST-based authenticated SSE, or short-lived query-param tokens.
  • HTTP/1.1 browsers limit SSE to 6 connections per domain. Multiple chat tabs exhaust the limit. Deploy behind HTTP/2 (which multiplexes streams) to avoid this.
  • SSE reconnection is built-in: the browser sends `Last-Event-ID` on reconnect. FastAPI reads it with `last_event_id: str | None = Header(None)`. Always include `id=` in your events.
  • WebSocket scaling requires sticky sessions and pub/sub (Redis or NATS) to fan events across server instances. Do not add this complexity for single-server deployments.
  • Backpressure: if the LLM generates faster than a slow client reads, server memory grows unbounded. Use `asyncio.Queue(maxsize=N)` to cap buffer size.

SSE or WebSocket? Decide in 30 Seconds

Most AI agent applications need to stream LLM responses to a browser — and that is a one-way server-to-client flow. Server-Sent Events (SSE) is designed exactly for this. It runs over standard HTTP, auto-reconnects on network drops, and works through every CDN and reverse proxy without special configuration. WebSocket provides a persistent bidirectional connection — but bidirectionality is a cost, not a feature, if you do not need it. Start with the question below and follow the branch.

Client sends datamid-stream?NoSSEone-way streamingYesWebSocketbidirectionalNeed POST orauth headers?NoEventSourceGET · auto-reconnectYesfetch() + ReadableStreamPOST · auth headers

Most LLM agents need SSE — only reach for WebSocket when clients must send data mid-stream

CriterionSSEWebSocket
DirectionServer → client onlyBidirectional
ProtocolStandard HTTP (HTTP/2 multiplexes streams)HTTP upgrade → custom framing
ReconnectionBuilt-in via Last-Event-ID headerMust implement manually
Proxy / CDN supportWorks natively through most proxies and CDNsRequires WebSocket-aware proxy (AWS ALB, Nginx, HAProxy)
MultiplexingHTTP/2: multiple SSE streams on one TCP connectionEach WebSocket = a separate TCP connection
Use for agentsLLM token streaming, status updates, one-shot responsesMid-stream cancellation, voice, real-time collaboration
ComplexityLow — formatted HTTP responsesHigher — connection lifecycle, heartbeats, message framing
Real project

Most production LLM chat applications — including ChatGPT, Claude.ai, and Perplexity — use SSE for token streaming paired with a separate REST POST for sending user messages. True WebSocket is reserved for voice agents and real-time collaborative tools where the client genuinely needs to send data while the server is actively streaming.

Default to SSE, upgrade to WebSocket only when needed

Start with SSE for LLM streaming. It is simpler to implement, debug, and scale. Only switch to WebSocket when you have a concrete bidirectional requirement: mid-stream cancellation, interactive voice, or real-time multi-user collaboration. Many production chat apps use SSE for streaming and a separate REST endpoint for sending messages — and that hybrid is the right default.