Agent Architecture/Single-Agent Patterns
Advanced18 min

Plan-and-Execute: When to Plan Upfront

Plan-and-Execute separates reasoning from acting: one LLM call decomposes the task into ordered steps, then an executor runs each step sequentially. Most tutorials stop at the mechanics. This article starts with whether you should use P&E at all, walks through the cost tradeoff (P&E is ~50% more expensive than ReAct but ~7% more accurate on complex tasks), covers the replan problem that dominates production failures, adds PEV quality gates to catch silent step drift, and ends with a model-tiered reference implementation that cuts cost ~85% vs naive Sonnet-everywhere.

Quick Reference

  • P&E earns its keep when the task has 4+ steps with ordering dependencies — below that, ReAct is cheaper
  • P&E is ~50% more expensive per task than ReAct ($0.09–0.14 vs $0.06–0.09) but ~7% more accurate on complex multi-step tasks
  • Model tiering is the primary cost optimization: Sonnet planner + Haiku executor cuts cost ~85% vs all-Sonnet
  • Store the plan in state as a list of Step objects with Annotated[list[tuple], operator.add] for past_steps
  • Cap replan count at 3, detect convergence (2 replans with no step completing = abort), never replan after every step
  • Add a PEV validation node after each executor step to catch silent quality drift before it propagates downstream
  • Plan quality and step quality are separate eval surfaces — measure both, not just final task completion

Should I Use Plan-and-Execute at All?

Most tutorials open with how to build a plan-and-execute agent. Start one step earlier: should you build one at all? P&E adds a planning call, a stateful execution loop, a replan mechanism, and a new failure surface — the planner's output quality, the executor's coherence, the replanner's convergence. Half the time, that overhead doesn't earn its keep.

Definition

The Plan-and-Execute pattern uses one LLM call to decompose a task into an ordered list of steps, then executes each step sequentially (or with controlled concurrency) using a tool-equipped executor, with an optional replanner that adjusts the plan mid-flight based on accumulated results. Unlike ReAct, it separates planning (what to do) from execution (how to do it) into distinct nodes.

When P&E is mostly taxWhy it fails to earn its keepDo this instead
Task takes fewer than 4 stepsPlanning overhead exceeds the coordination benefitReAct with tools — same accuracy, lower cost
Steps are independent (no ordering dependency)Planning adds latency without adding coordination valueParallelization pattern — fan out, collect results
Task is highly dynamic (requirements change mid-flight)Plan is stale before step 2 finishesSupervisor or ReAct — iterative judgment is the right tool
Steps are identical (same tool, different inputs)No decomposition needed — it's just a loopMap-reduce or fan-out with Send
AspectReActPlan-and-ExecuteSupervisor
LLM calls per task3–5 (tool loop)5–8 (plan + execute)5–10+ (iterative delegation)
Planning styleImplicit (next-step reasoning)Explicit (upfront plan)Dynamic (per-iteration)
Cost per task$0.06–0.09$0.09–0.14$0.12–0.20
Completion rate (complex tasks)~85%~92%~90%
Best forSimple tool-use tasks, <4 stepsMulti-step with step ordering dependenciesTasks needing judgment and coordination
The P&E decision math

P&E is ~50% more expensive per task than ReAct but ~7% more accurate on complex multi-step tasks. That tradeoff earns its keep when step ordering matters, you need progress tracking across steps, and individual steps depend on prior results. Below 4 steps, ReAct is cheaper and nearly as effective. Above 7 steps, the planner's output degrades and you should decompose into sub-tasks with separate plans. The cost of building and owning the replan loop is real — don't pay it without a reason.