How to Design an Agent System

A decision framework for choosing between chains, single agents, and multi-agent systems. Covers when not to build an agent at all, cost estimation before you write code, the six failure modes every production agent hits, model tiering strategy, and a production-shaped LangGraph reference implementation.

Quick Reference

→If a human can write a fixed checklist for the task, use a chain — not an agent
→Start with a chain; promote to an agent only when the LLM must choose tools at runtime
→Keep tools under 8-10 per agent — selection accuracy degrades sharply beyond that
→Estimate per-query cost before building: (input_tokens × price + output_tokens × price) × avg_calls
→Prototype with Claude Opus 4.7 to establish the quality ceiling; ship with Sonnet 4.6
→Set max_iterations (5-15) and a cost ceiling to prevent runaway loops
→Build 50-100 hand-labeled eval cases before your second prompt iteration
→Instrument token usage, iteration count, error rate, and latency p95 from day one

When NOT to Build an Agent

The most important design decision is the one you don't make. Most tasks that feel like they need an agent can be solved with a chain, a single LLM call, or no LLM at all. An agent adds latency (2-8 LLM calls vs 1), cost (5-15x a chain), and debugging surface. That tax must be justified by genuinely dynamic behavior — not because agents feel more impressive.

Task shape	Example	Right tool	Why not an agent
Extract structured data from text	Parse name, email, company from a business card	Single LLM call with structured output	The steps are always the same — one extraction call
Fixed pipeline with known stages	Translate → summarize → format → post	Chain	Every input follows the same path; no runtime branching needed
Classify into one of N categories	Route a support ticket to billing / technical / general	Router (one LLM call)	Classification is a single structured output, not a tool-calling loop
Retrieval + answer (RAG)	Answer a question from your documentation	Chain (retrieve → generate)	The steps are fixed; the LLM doesn't decide which tools to call
Dynamic tool selection with judgment	Research a company and write a personalized sales email	Single agent	The LLM genuinely needs to decide which searches to run and in what order
Multi-domain coordination	Route billing AND engineering issues, each needing domain expertise	Multi-agent (supervisor pattern)	Two distinct context sets that don't fit cleanly in one agent's system prompt

Real project

A payments team built a multi-agent system to process vendor invoices: an extraction agent, a validation agent, and an approval agent. After two weeks of debugging coordination failures, they realized every invoice followed the same 3-step path — extract fields, validate against PO, write to ledger. A deterministic chain handled all of it in 300ms at $0.003/invoice. The multi-agent system averaged 4 seconds and $0.18. The dynamic behavior they thought they needed was two if/else branches.

Learn this in → prompt-chaining

The agent tax is real

At 10K queries/day, the difference between a chain ($0.003/query) and a single agent ($0.05/query) is $47/day vs $500/day — $16K vs $180K annually. That gap must be justified by the business value the dynamic behavior delivers. If it can't be, use the chain.

Chain vs Agent vs Multi-Agent: The Real Comparison

Start simple — promote only when complexity is justified

The Design Process: Scoping Your Agent

Before writing code, write a one-paragraph agent spec. This document forces you to define the boundaries that prevent scope creep, cost explosions, and the 'what does this agent actually do?' confusion that kills most production rollouts.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.