Advanced15 min

Streaming

Decide whether to stream, pick the right mode for your UI, ship it over HTTP with async streaming, and handle the failures that only appear in production. All patterns use create_agent with version='v2'.

Quick Reference

→stream_mode='updates' — state diff after each agent step
→stream_mode='messages' — LLM token chunks as they generate
→stream_mode='custom' — arbitrary data via get_stream_writer() in tools
→Combine: stream_mode=['messages', 'updates'] — default for chat UIs
→version='v2' gives every chunk {type, ns, data} shape (requires LangGraph >= 1.1)
→Use astream() in async apps (FastAPI/Django) — sync stream() blocks the event loop
→7 modes exist (values, updates, messages, custom, checkpoints, tasks, debug) — most agents need only 3
→Resume interrupted streams: same thread_id + Command(resume=...) from checkpoint

Should You Stream at All?

Streaming adds a failure surface. Before adding it, answer one question: is a human watching this agent run in real time? If yes, stream. If no, use .invoke().

Agent type	Stream?	Reason
Chat assistant	Yes — stream_mode='messages'	User expects typing effect; latency perception matters
Multi-step research agent	Yes — stream_mode='updates'	User sees step progress, not a frozen spinner
Background batch job	No — use .invoke()	No human watching; streaming adds complexity for nothing
Webhook-triggered pipeline	No — use .invoke()	Caller wants a result, not an event stream
Scheduled nightly job	No — use .invoke()	Nobody is reading the stream; logs are sufficient
Tool inside another agent	No — use .invoke()	Sub-tool results are consumed by the parent, not a UI

Streaming background agents wastes tokens and money

If nobody consumes the stream, the generator still drives LLM generation forward. The tokens are charged but the stream events are dropped. Use .invoke() for background work.

Choosing a Stream Mode

LangGraph exposes seven stream modes. Three cover 95% of production agent UIs. The other four are for replay and debugging.

The v2 Chunk Format

Pass version='v2' to get a unified chunk format. Every chunk is a dict with three keys: type (the stream mode), ns (namespace — a tuple path identifying subgraph source), and data (the payload). This means one type check handles all modes, regardless of how many you combine.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.