How to Design an Agent System
A decision framework for choosing between chains, single agents, and multi-agent systems. Covers when not to build an agent at all, cost estimation before you write code, the six failure modes every production agent hits, model tiering strategy, and a production-shaped LangGraph reference implementation.
Quick Reference
- →If a human can write a fixed checklist for the task, use a chain — not an agent
- →Start with a chain; promote to an agent only when the LLM must choose tools at runtime
- →Keep tools under 8-10 per agent — selection accuracy degrades sharply beyond that
- →Estimate per-query cost before building: (input_tokens × price + output_tokens × price) × avg_calls
- →Prototype with Claude Opus 4.7 to establish the quality ceiling; ship with Sonnet 4.6
- →Set max_iterations (5-15) and a cost ceiling to prevent runaway loops
- →Build 50-100 hand-labeled eval cases before your second prompt iteration
- →Instrument token usage, iteration count, error rate, and latency p95 from day one
When NOT to Build an Agent
The most important design decision is the one you don't make. Most tasks that feel like they need an agent can be solved with a chain, a single LLM call, or no LLM at all. An agent adds latency (2-8 LLM calls vs 1), cost (5-15x a chain), and debugging surface. That tax must be justified by genuinely dynamic behavior — not because agents feel more impressive.
| Task shape | Example | Right tool | Why not an agent |
|---|---|---|---|
| Extract structured data from text | Parse name, email, company from a business card | Single LLM call with structured output | The steps are always the same — one extraction call |
| Fixed pipeline with known stages | Translate → summarize → format → post | Chain | Every input follows the same path; no runtime branching needed |
| Classify into one of N categories | Route a support ticket to billing / technical / general | Router (one LLM call) | Classification is a single structured output, not a tool-calling loop |
| Retrieval + answer (RAG) | Answer a question from your documentation | Chain (retrieve → generate) | The steps are fixed; the LLM doesn't decide which tools to call |
| Dynamic tool selection with judgment | Research a company and write a personalized sales email | Single agent | The LLM genuinely needs to decide which searches to run and in what order |
| Multi-domain coordination | Route billing AND engineering issues, each needing domain expertise | Multi-agent (supervisor pattern) | Two distinct context sets that don't fit cleanly in one agent's system prompt |
A payments team built a multi-agent system to process vendor invoices: an extraction agent, a validation agent, and an approval agent. After two weeks of debugging coordination failures, they realized every invoice followed the same 3-step path — extract fields, validate against PO, write to ledger. A deterministic chain handled all of it in 300ms at $0.003/invoice. The multi-agent system averaged 4 seconds and $0.18. The dynamic behavior they thought they needed was two if/else branches.
Learn this in → prompt-chaining
At 10K queries/day, the difference between a chain ($0.003/query) and a single agent ($0.05/query) is $47/day vs $500/day — $16K vs $180K annually. That gap must be justified by the business value the dynamic behavior delivers. If it can't be, use the chain.