Intermediate18 min

WebSocket & SSE for Agents

How to choose between Server-Sent Events and WebSocket for AI agent communication, with production-ready FastAPI code using the native EventSourceResponse API, authentication patterns, backpressure handling, and scaling strategies.

Quick Reference

→SSE (Server-Sent Events): one-way server-to-client streaming over HTTP. Default for LLM token streaming — auto-reconnects, works through CDNs, no special proxy configuration.
→FastAPI 0.135.0+ ships native SSE via `EventSourceResponse` and `ServerSentEvent` from `fastapi.sse`. It automatically sets keep-alive pings, Cache-Control, and X-Accel-Buffering headers.
→WebSocket is bidirectional — both sides can send at any time. Only use it when the client must send data mid-stream: cancellation, interactive voice, or real-time multi-user collaboration.
→EventSource cannot send custom headers. You cannot pass an Authorization header. Use `fetch()` + `ReadableStream` for POST-based authenticated SSE, or short-lived query-param tokens.
→HTTP/1.1 browsers limit SSE to 6 connections per domain. Multiple chat tabs exhaust the limit. Deploy behind HTTP/2 (which multiplexes streams) to avoid this.
→SSE reconnection is built-in: the browser sends `Last-Event-ID` on reconnect. FastAPI reads it with `last_event_id: str | None = Header(None)`. Always include `id=` in your events.
→WebSocket scaling requires sticky sessions and pub/sub (Redis or NATS) to fan events across server instances. Do not add this complexity for single-server deployments.
→Backpressure: if the LLM generates faster than a slow client reads, server memory grows unbounded. Use `asyncio.Queue(maxsize=N)` to cap buffer size.

SSE or WebSocket? Decide in 30 Seconds

Most AI agent applications need to stream LLM responses to a browser — and that is a one-way server-to-client flow. Server-Sent Events (SSE) is designed exactly for this. It runs over standard HTTP, auto-reconnects on network drops, and works through every CDN and reverse proxy without special configuration. WebSocket provides a persistent bidirectional connection — but bidirectionality is a cost, not a feature, if you do not need it. Start with the question below and follow the branch.

Most LLM agents need SSE — only reach for WebSocket when clients must send data mid-stream

Criterion	SSE	WebSocket
Direction	Server → client only	Bidirectional
Protocol	Standard HTTP (HTTP/2 multiplexes streams)	HTTP upgrade → custom framing
Reconnection	Built-in via Last-Event-ID header	Must implement manually
Proxy / CDN support	Works natively through most proxies and CDNs	Requires WebSocket-aware proxy (AWS ALB, Nginx, HAProxy)
Multiplexing	HTTP/2: multiple SSE streams on one TCP connection	Each WebSocket = a separate TCP connection
Use for agents	LLM token streaming, status updates, one-shot responses	Mid-stream cancellation, voice, real-time collaboration
Complexity	Low — formatted HTTP responses	Higher — connection lifecycle, heartbeats, message framing

Real project

Most production LLM chat applications — including ChatGPT, Claude.ai, and Perplexity — use SSE for token streaming paired with a separate REST POST for sending user messages. True WebSocket is reserved for voice agents and real-time collaborative tools where the client genuinely needs to send data while the server is actively streaming.

Default to SSE, upgrade to WebSocket only when needed

Start with SSE for LLM streaming. It is simpler to implement, debug, and scale. Only switch to WebSocket when you have a concrete bidirectional requirement: mid-stream cancellation, interactive voice, or real-time multi-user collaboration. Many production chat apps use SSE for streaming and a separate REST endpoint for sending messages — and that hybrid is the right default.

SSE for LLM Streaming (The 90% Case)

FastAPI 0.135.0+ ships native SSE support. Import `EventSourceResponse` and `ServerSentEvent` from `fastapi.sse` and use them as your response class and yield type. FastAPI handles the SSE wire format, sets `Cache-Control: no-cache` and `X-Accel-Buffering: no` automatically, and sends a keep-alive ping comment every 15 seconds to prevent proxies from closing the connection. You no longer need to manually format `id: ...\nevent: ...\ndata: ...\n\n` strings or set response headers.

Client-Side SSE: EventSource vs fetch()

There are two ways to consume SSE on the client. EventSource is the browser built-in — simple, auto-reconnects, but GET-only with no custom headers. `fetch()` + `ReadableStream` is more flexible: supports POST, any header including Authorization, and is what MCP-compatible clients use. Choose based on your auth model.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.