Prompt Chaining: Sequential LLM Pipelines
Prompt chaining sequences focused LLM calls — each step's output becomes the next step's sole input, with gate functions between steps acting as circuit breakers. This article covers the decision framework for when to use it, the cost and latency math, what fails in production, and how to evaluate and debug chains.
Quick Reference
- →Chaining = Step A → Gate → Step B → Gate → Step C → output (no shared state, no tool calls)
- →Sequential means additive latency: 3 steps × 1.5s/step = 4.5s minimum — design for it
- →Use chaining when steps are sequential, bounded, and isolated; use agents when steps need iterative reasoning
- →Gates are circuit breakers — structural gates (JSON parse, schema) before LLM-based gates
- →Errors compound: bad JSON in step 1 becomes confident hallucination in step 3 without a gate
- →Model tier per step: Haiku for extraction/validation, Sonnet/Opus for generation
- →When output is wrong, binary-search the chain — test each step in isolation to find the break
What Is Prompt Chaining?
Step A → Gate → Step B → Gate → Step C → Output
Prompt chaining decomposes a task into a sequence of LLM calls where each step's output becomes the next step's sole input. Gate functions between steps validate quality and catch failures before they propagate. No shared state, no tool calls, no iterative reasoning — just focused sequential transformation.
The defining constraint is isolation: each step receives only what the previous step produced. This is both the pattern's strength (each step can be tested and optimized independently) and its main failure mode (a bad step 1 has nowhere to hide).