Advanced15 min
Streaming
Decide whether to stream, pick the right mode for your UI, ship it over HTTP with async streaming, and handle the failures that only appear in production. All patterns use create_agent with version='v2'.
Quick Reference
- →stream_mode='updates' — state diff after each agent step
- →stream_mode='messages' — LLM token chunks as they generate
- →stream_mode='custom' — arbitrary data via get_stream_writer() in tools
- →Combine: stream_mode=['messages', 'updates'] — default for chat UIs
- →version='v2' gives every chunk {type, ns, data} shape (requires LangGraph >= 1.1)
- →Use astream() in async apps (FastAPI/Django) — sync stream() blocks the event loop
- →7 modes exist (values, updates, messages, custom, checkpoints, tasks, debug) — most agents need only 3
- →Resume interrupted streams: same thread_id + Command(resume=...) from checkpoint
Should You Stream at All?
Streaming adds a failure surface. Before adding it, answer one question: is a human watching this agent run in real time? If yes, stream. If no, use .invoke().
| Agent type | Stream? | Reason |
|---|---|---|
| Chat assistant | Yes — stream_mode='messages' | User expects typing effect; latency perception matters |
| Multi-step research agent | Yes — stream_mode='updates' | User sees step progress, not a frozen spinner |
| Background batch job | No — use .invoke() | No human watching; streaming adds complexity for nothing |
| Webhook-triggered pipeline | No — use .invoke() | Caller wants a result, not an event stream |
| Scheduled nightly job | No — use .invoke() | Nobody is reading the stream; logs are sufficient |
| Tool inside another agent | No — use .invoke() | Sub-tool results are consumed by the parent, not a UI |
Streaming background agents wastes tokens and money
If nobody consumes the stream, the generator still drives LLM generation forward. The tokens are charged but the stream events are dropped. Use .invoke() for background work.