LangChain/Agents
Advanced15 min

Streaming

Decide whether to stream, pick the right mode for your UI, ship it over HTTP with async streaming, and handle the failures that only appear in production. All patterns use create_agent with version='v2'.

Quick Reference

  • stream_mode='updates' — state diff after each agent step
  • stream_mode='messages' — LLM token chunks as they generate
  • stream_mode='custom' — arbitrary data via get_stream_writer() in tools
  • Combine: stream_mode=['messages', 'updates'] — default for chat UIs
  • version='v2' gives every chunk {type, ns, data} shape (requires LangGraph >= 1.1)
  • Use astream() in async apps (FastAPI/Django) — sync stream() blocks the event loop
  • 7 modes exist (values, updates, messages, custom, checkpoints, tasks, debug) — most agents need only 3
  • Resume interrupted streams: same thread_id + Command(resume=...) from checkpoint

Should You Stream at All?

Streaming adds a failure surface. Before adding it, answer one question: is a human watching this agent run in real time? If yes, stream. If no, use .invoke().

Agent typeStream?Reason
Chat assistantYes — stream_mode='messages'User expects typing effect; latency perception matters
Multi-step research agentYes — stream_mode='updates'User sees step progress, not a frozen spinner
Background batch jobNo — use .invoke()No human watching; streaming adds complexity for nothing
Webhook-triggered pipelineNo — use .invoke()Caller wants a result, not an event stream
Scheduled nightly jobNo — use .invoke()Nobody is reading the stream; logs are sufficient
Tool inside another agentNo — use .invoke()Sub-tool results are consumed by the parent, not a UI
Streaming background agents wastes tokens and money

If nobody consumes the stream, the generator still drives LLM generation forward. The tokens are charged but the stream events are dropped. Use .invoke() for background work.