Designing Agent Workflows
When to use a graph instead of a chain, how to choose the right topology, how to design nodes and state for testability, and how to add human-in-the-loop gates with the current interrupt() API — a decision-first guide to LangGraph workflow design.
Quick Reference
- →Ask 'do I need a graph?' first — if no conditional logic, no cycles, no HITL gates, a chain or function is simpler
- →Map each distinct step to a graph node — keep nodes single-responsibility (one LLM call, one tool, or one routing decision)
- →Use conditional edges for decision points; use Command(goto=) when the node itself knows the next destination
- →Define your state schema upfront with TypedDict — what flows between nodes, what persists, what is ephemeral
- →Place HITL gates before irreversible actions using interrupt() inside nodes — resume with Command(resume=value)
- →Every additional LLM-calling node adds cost; checkpoint writes add 5-20ms per node — measure before adding complexity
- →Validate at three levels: compile() catches wiring errors, unit tests verify node logic, LangSmith traces verify end-to-end flow
Do You Even Need a Graph?
The first design decision is whether to reach for a StateGraph at all. Graphs add checkpoint overhead, state serialization cost, and debugging complexity. If your workflow has no conditional logic, no cycles, and no human-in-the-loop gates, a chain or a plain function is simpler, faster, and easier to test.
| Your workflow has... | Use | Why | Example |
|---|---|---|---|
| Linear steps, no branching | LCEL chain | No checkpoint overhead, simpler tracing | fetch -> summarize -> format |
| One conditional branch | if/else in a function | A graph buys nothing over a 3-line conditional | if intent == 'billing': ... else: ... |
| 2+ conditional branches | StateGraph with conditional edges | Graph makes routing explicit and testable | classify -> route -> billing | tech | general |
| Cycles (agent loops) | StateGraph with recursion_limit | Cycles require explicit exit conditions | reason -> act -> observe -> (loop or exit) |
| Human approval gates | StateGraph + interrupt() + checkpointer | State must persist between invocations across the approval wait | draft -> interrupt() -> human approves -> send |
| Parallel branches that rejoin | StateGraph with fan-out/fan-in | Reducers merge concurrent state updates correctly | search docs AND check inventory simultaneously |
A graph adds checkpoint overhead, state serialization cost, and debugging complexity. If you can solve the problem with a 20-line function and an if/else, do that instead. Reach for StateGraph only when branching or persistence adds real value.
A team built a 7-node StateGraph for a summarization pipeline that had zero conditional logic — every node always ran in the same order. Replacing it with an LCEL chain cut code from 120 lines to 35, removed the checkpointer dependency, and reduced latency by ~40% by eliminating checkpoint writes between nodes.
Learn this in → 3 min