Plan-and-Execute: When to Plan Upfront
Plan-and-Execute separates reasoning from acting: one LLM call decomposes the task into ordered steps, then an executor runs each step sequentially. Most tutorials stop at the mechanics. This article starts with whether you should use P&E at all, walks through the cost tradeoff (P&E is ~50% more expensive than ReAct but ~7% more accurate on complex tasks), covers the replan problem that dominates production failures, adds PEV quality gates to catch silent step drift, and ends with a model-tiered reference implementation that cuts cost ~85% vs naive Sonnet-everywhere.
Quick Reference
- →P&E earns its keep when the task has 4+ steps with ordering dependencies — below that, ReAct is cheaper
- →P&E is ~50% more expensive per task than ReAct ($0.09–0.14 vs $0.06–0.09) but ~7% more accurate on complex multi-step tasks
- →Model tiering is the primary cost optimization: Sonnet planner + Haiku executor cuts cost ~85% vs all-Sonnet
- →Store the plan in state as a list of Step objects with Annotated[list[tuple], operator.add] for past_steps
- →Cap replan count at 3, detect convergence (2 replans with no step completing = abort), never replan after every step
- →Add a PEV validation node after each executor step to catch silent quality drift before it propagates downstream
- →Plan quality and step quality are separate eval surfaces — measure both, not just final task completion
Should I Use Plan-and-Execute at All?
Most tutorials open with how to build a plan-and-execute agent. Start one step earlier: should you build one at all? P&E adds a planning call, a stateful execution loop, a replan mechanism, and a new failure surface — the planner's output quality, the executor's coherence, the replanner's convergence. Half the time, that overhead doesn't earn its keep.
The Plan-and-Execute pattern uses one LLM call to decompose a task into an ordered list of steps, then executes each step sequentially (or with controlled concurrency) using a tool-equipped executor, with an optional replanner that adjusts the plan mid-flight based on accumulated results. Unlike ReAct, it separates planning (what to do) from execution (how to do it) into distinct nodes.
| When P&E is mostly tax | Why it fails to earn its keep | Do this instead |
|---|---|---|
| Task takes fewer than 4 steps | Planning overhead exceeds the coordination benefit | ReAct with tools — same accuracy, lower cost |
| Steps are independent (no ordering dependency) | Planning adds latency without adding coordination value | Parallelization pattern — fan out, collect results |
| Task is highly dynamic (requirements change mid-flight) | Plan is stale before step 2 finishes | Supervisor or ReAct — iterative judgment is the right tool |
| Steps are identical (same tool, different inputs) | No decomposition needed — it's just a loop | Map-reduce or fan-out with Send |
| Aspect | ReAct | Plan-and-Execute | Supervisor |
|---|---|---|---|
| LLM calls per task | 3–5 (tool loop) | 5–8 (plan + execute) | 5–10+ (iterative delegation) |
| Planning style | Implicit (next-step reasoning) | Explicit (upfront plan) | Dynamic (per-iteration) |
| Cost per task | $0.06–0.09 | $0.09–0.14 | $0.12–0.20 |
| Completion rate (complex tasks) | ~85% | ~92% | ~90% |
| Best for | Simple tool-use tasks, <4 steps | Multi-step with step ordering dependencies | Tasks needing judgment and coordination |
P&E is ~50% more expensive per task than ReAct but ~7% more accurate on complex multi-step tasks. That tradeoff earns its keep when step ordering matters, you need progress tracking across steps, and individual steps depend on prior results. Below 4 steps, ReAct is cheaper and nearly as effective. Above 7 steps, the planner's output degrades and you should decompose into sub-tasks with separate plans. The cost of building and owning the replan loop is real — don't pay it without a reason.