Agent Architecture
All agent patterns in one place: single-agent (ReAct, Reflection), multi-agent (Supervisor, Swarm, A2A), workflow (Router, Orchestrator-Worker), plus system design, memory, and frontend.
A decision framework for choosing between chains, single agents, and multi-agent systems. Covers when not to build an agent at all, cost estimation before you write code, the six failure modes every production agent hits, model tiering strategy, and a production-shaped LangGraph reference implementation.
The foundational agent loop: decide when ReAct earns its cost over simpler patterns, understand the token math behind each iteration, learn the five ways it fails in production, and ship it with create_agent, cost controls, and eval.
When to add a self-critique loop, what it costs, where it fails, and how to measure whether it's earning its latency. Includes conditional-edge and Command-based LangGraph implementations, production-shaped code with regression guards, and a clear comparison with Evaluator-Optimizer.
Plan-and-Execute separates reasoning from acting: one LLM call decomposes the task into ordered steps, then an executor runs each step sequentially. Most tutorials stop at the mechanics. This article starts with whether you should use P&E at all, walks through the cost tradeoff (P&E is ~50% more expensive than ReAct but ~7% more accurate on complex tasks), covers the replan problem that dominates production failures, adds PEV quality gates to catch silent step drift, and ends with a model-tiered reference implementation that cuts cost ~85% vs naive Sonnet-everywhere.
Prompt chaining sequences focused LLM calls — each step's output becomes the next step's sole input, with gate functions between steps acting as circuit breakers. This article covers the decision framework for when to use it, the cost and latency math, what fails in production, and how to evaluate and debug chains.
Two LLMs, two roles: one generates, one judges. The Evaluator-Optimizer pattern runs a structured feedback loop until output clears a quality threshold — or a cost budget runs out. Before using it, you need to answer two questions: does your task have measurable quality criteria, and does it actually benefit from more than one attempt?
The Router pattern classifies input once and dispatches to specialized handlers — fast and cheap when categories are clearly separable. Most articles teach the mechanics; this one starts with whether you should build one at all, then walks through cost math, per-class eval, drift detection, the three places routing actually fails in production, and a 30-day shipping runbook.
The Orchestrator-Worker pattern parallelizes complex tasks by having one LLM plan the decomposition, multiple workers execute in parallel, and the orchestrator synthesize the results. This article covers when it actually earns its 4× cost premium, how to validate plan quality before spending on workers, production hardening, and how to run an A/B eval against a single-agent baseline.
Parallelization runs multiple LLMs concurrently to gain confidence (voting: same input, N opinions) or speed (sectioning: split input, parallel workers). This article starts with whether you should parallelize at all, walks through the actual cost math — 3× Haiku can be cheaper than 1× Sonnet with caching — and covers the two production failure modes most articles skip: superstep atomicity and correlated errors.
Skills inject domain expertise into an agent on demand via progressive disclosure — keeping the context window lean while giving the agent access to deep knowledge across many domains. This article covers when to use skills over subagents and tools, the full SKILL.md specification, derivable context budget math, how skills fail in production, and how to evaluate skill activation.
When and why to split work across multiple agents — with cost math, a pattern-selection decision tree, and the production guardrails most overviews skip.
The supervisor pattern coordinates specialized worker agents by calling them as tools. Learn when it earns its cost, how to build it with LangChain 1.0's current API, and how to evaluate whether it actually outperforms a single agent.
When peer-to-peer agent handoffs earn their complexity, what they cost per handoff depth, how they fail in production, and how to defend against each failure mode. Includes production-grade LangGraph code with checkpointer, context management strategies with cost math, and a sharpened comparison with the supervisor pattern.
Async subagents (Deep Agents v0.5) let a supervisor delegate long-running tasks to background agents while continuing to chat with the user. This article covers the decision criteria for when async is worth the complexity, token cost math, production error handling, five concrete failure modes and their defenses, three orchestration patterns with code, and the five metrics you need to monitor before something breaks.
Google's Agent-to-Agent (A2A) protocol — now at v1.2 with 150+ organizations in production — standardizes how agents discover, authenticate, and communicate across service boundaries using JSON-RPC over HTTP. Covers the Protocol Triangle (A2A, MCP, AG-UI), the message-based API, signed Agent Cards, LangGraph integration, and production failure modes.
When to add memory to your agent, how the two-layer architecture works, what it costs in tokens and money, and the six ways it fails silently in production.
LangMem is an LLM-powered extraction layer that automatically identifies and persists structured facts from conversations. This article covers when to use it (and when not to), all three APIs with correct signatures, cost analysis, memory quality evaluation, failure modes, and GDPR deletion.
Agent system prompts are operating contracts, not personality descriptions. This article covers how to structure them with XML tags, write tool usage rules that actually enforce behavior, defend against prompt injection, use adaptive thinking correctly, and build a prompt evaluation harness that gates every change.
Tool descriptions are serialized into every request as the model's only guide for tool selection. This article covers the full anatomy of a production-grade description, the token cost math, disambiguation patterns, schema enforcement with strict mode, and how to measure and debug tool selection quality.
When to use few-shot examples for agents, how to build trajectory examples that teach reasoning patterns, static vs dynamic selection with real cost math, and measuring whether examples actually help.
A decision-first guide to managing prompts in production: when to build a registry, how to choose between LangSmith, Langfuse, and LaunchDarkly, how to gate promotions with an eval suite, and how to run statistically rigorous A/B tests instead of guessing.
When to use a graph instead of a chain, how to choose the right topology, how to design nodes and state for testability, and how to add human-in-the-loop gates with the current interrupt() API — a decision-first guide to LangGraph workflow design.
How to structure LangGraph state for parallel execution, API safety, and long-running workflows — including schema separation, reducers, the Command API, state explosion mitigation, checkpointing, and debugging with time travel.
How to architect agents that don't crash: classify errors before handling them, design timeout budgets at every level, validate state at node boundaries, and verify failure paths with fault injection.
When to manage context, how context rot degrades agents before you hit any limit, and the full strategy stack — server-side compaction, context editing, trimming, and summarization — with cost math and production failure modes.
How to decide when to structure a LangGraph project, how to separate graph topology from node logic from tools, how to navigate the monorepo vs polyrepo decision, what breaks when you get structure wrong, and how the layout evolves as your agent count grows.
Query routing is the highest-leverage optimization in an agent system — it determines which model, which tools, and how much context each query gets. This article covers the three routing strategies (keyword, embedding, LLM), how to cascade them in production, how to evaluate and monitor router accuracy, and how to defend against the failure modes that will bite you.
How to design AI agents that always return something useful — even when LLM APIs fail, rate limits hit, or traffic spikes. Covers fallback chains, circuit breakers, semantic degradation detection, and progressive load shedding.
Three paradigms for rendering agent output as interactive UI: Static component registries, Declarative agent-described interfaces (A2UI), and Open-Ended agent-generated surfaces. Covers Vercel AI SDK and LangGraph integration, security trust boundaries, error recovery, and testing strategies.
The useStream React hook connects your UI to a LangGraph agent with real-time streaming — messages, tool progress, interrupts, branching, subagent output, and reconnection. Works with any LangGraph backend via apiUrl or custom transport.
OpenAI's native function calling lets the model invoke your code with structured arguments — no framework required. This article covers when to drop the abstraction layer, how to use the Responses API (the recommended path for new projects), and what breaks in production when tool loops run unchecked.
How to build production tool-using agents with the Anthropic SDK: tool definitions with strict mode and input examples, the five tool_choice modes and their interaction with adaptive thinking, server tools, model tier selection with cost math, and the checklist that prevents the most common agent failures.
Building production AI applications with Vercel AI SDK 6: the streaming architecture from React hooks to API routes, ToolLoopAgent for agentic workflows with cost controls, structured output with generateObject, provider switching with model tiering, and the failure modes you need to handle before shipping.
A decision guide for choosing between LangGraph, CrewAI, AG2, OpenAI Agents SDK, Google ADK, Mastra, Vercel AI SDK, and Direct API — structured around cost, lock-in risk, failure modes, and concrete trade-offs rather than feature lists.
How to build a production-ready agent with nothing but the Anthropic SDK and a while loop — and when that's still the right choice. Covers the full manual agent loop, token cost math, streaming, failure modes, testing, and the graduation path from manual loop to tool_runner to Agent SDK to LangGraph.
When to build an agent that runs for hours instead of seconds — which orchestration framework to choose, how to compute real costs, the five ways long-running agents fail in production, and a reference implementation with checkpointing, error classification, idempotency, and budget enforcement.
Building AI agents that navigate and interact with websites: Playwright + LLM for web tasks, page understanding strategies, action spaces, and error recovery patterns.
When to use computer use versus API automation, the screenshot-analyze-act loop with the current computer_20251124 tool, real cost math that shows context growth dominates price, Docker and ephemeral VM sandboxing with prompt injection defense, verification and stuck detection, production failure modes, and a reference implementation using the latest Anthropic API.
Build agents that generate, execute, and iterate on code safely. Covers managed sandboxes (Claude's native code execution tool, E2B), self-hosted Docker, the security gap between 'code ran' and 'answer is correct', and cost math for each option.
Building supervision layers for autonomous agents: kill switches, permission systems, human approval gates, monitoring dashboards, and complete audit logging for post-mortem analysis.
End-to-end walkthrough: build a multi-source research agent with planning, parallel web search, subagent delegation, filesystem persistence, and report synthesis using Deep Agents.
End-to-end walkthrough: build a customer support agent with query routing, RAG knowledge base, tool-calling for account actions, human-in-the-loop escalation, and multi-tenant auth.
End-to-end walkthrough: build a production RAG system with ingestion pipeline, hybrid search, self-corrective retrieval, answer validation, and continuous evaluation.
End-to-end walkthrough: build a code review agent with Deep Agents + sandbox for safe code analysis, project-specific skills, parallel file review, and GitHub integration.
How to build production agents for contract analysis, compliance checking, legal research, and document review — with the guardrails that regulated environments demand.
Building production agents for financial research, risk assessment, portfolio analysis, and report generation — with the numerical accuracy, audit trails, and regulatory compliance that finance demands.
Building production agents for clinical decision support, patient documentation, and medical Q&A — with HIPAA compliance, safety guardrails, and the principle that AI assists clinicians but never replaces clinical judgment.
Design patterns for production customer support agents: multi-tier routing, RAG knowledge bases, account action tools, HITL escalation, session memory, and satisfaction tracking.