Integrations/Real-Time AI
Intermediate10 min

WebSocket & SSE for Agents

Server-Sent Events vs WebSocket for AI agent communication. When to use each, reconnection strategies, scaling patterns, and production code for both SSE streaming and WebSocket interactive agents.

Quick Reference

  • SSE (Server-Sent Events): one-way server-to-client streaming over HTTP. Perfect for LLM token streaming — simple, auto-reconnects, works through proxies.
  • WebSocket: bidirectional persistent connection. Use for interactive agents where the client sends input mid-stream (e.g., cancellation, follow-ups).
  • SSE reconnection is built-in: the browser auto-reconnects with the Last-Event-ID header. Send event IDs to enable resume from the last received event.
  • WebSocket scaling: each connection is persistent and stateful. Use sticky sessions or connection-aware load balancers. Plan for ~10K connections per server.
  • Default to SSE for LLM streaming — it is simpler, works through CDNs, and HTTP/2 multiplexes streams efficiently. Only use WebSocket when you need bidirectional communication.

SSE: Simple, One-Way Streaming

Server-Sent Events (SSE) is a simple HTTP-based protocol where the server pushes events to the client over a long-lived HTTP connection. It is one-way: the server sends, the client listens. For LLM token streaming — where the server generates tokens and the client displays them — SSE is the ideal choice. It works through CDNs, proxies, and load balancers without special configuration, and the browser's EventSource API handles reconnection automatically.

SSE streaming endpoint for agent responses with FastAPI
Client-side SSE consumption with EventSource and reconnection