Agent Architecture/Multi-Agent Patterns
Advanced14 min

Swarm & Handoffs

When peer-to-peer agent handoffs earn their complexity, what they cost per handoff depth, how they fail in production, and how to defend against each failure mode. Includes production-grade LangGraph code with checkpointer, context management strategies with cost math, and a sharpened comparison with the supervisor pattern.

Quick Reference

  • Swarm = no central supervisor; each agent has handoff tools that transfer control to a peer agent
  • create_handoff_tool(agent_name, description) generates the tool; create_swarm(agents, default_active_agent) builds the graph
  • Only one agent is active at a time — control transfers when the active agent calls a handoff tool
  • Full message history passes to the receiving agent by default — context grows linearly with handoff depth
  • OpenAI's experimental Swarm framework (2024) popularized this pattern; it was superseded by the OpenAI Agents SDK in March 2025
  • Always compile with a checkpointer for multi-turn production use — without it, each invoke starts from scratch
  • Circular handoffs (A → B → A) are the most common production failure — add a depth guard and test handoff chains end-to-end
  • Swarm trades audit trail clarity for routing flexibility — if you need a central log of all routing decisions, use supervisor instead

When NOT to Use Swarm

Supervisor is the safer default

Most multi-agent systems don't need swarm. Swarm is a specialization for conversational flows where routing logic is genuinely distributed across agents. The supervisor pattern gives you a centralized audit trail, guaranteed execution order, and easier debugging. Use supervisor unless you have a specific signal that swarm fits.

SignalWhat it suggestsUse instead
You need a guaranteed execution order (research → write → review)Workflow, not conversationSupervisor or plan-and-execute
You need a single audit trail of all routing decisionsCompliance or debugging prioritySupervisor — it logs every routing decision centrally
One agent should see all agents' outputs before deciding next stepGlobal coordination neededSupervisor with shared state
Tasks are independent and can run in parallelFan-out, not sequential handoffParallelization pattern or Send API
Agents are in different services or organizationsCross-service boundaryA2A Protocol

Swarm earns its complexity in a narrow set of conditions: (1) the conversation naturally shifts between domains — the user's intent itself requires moving between billing, technical, and account agents, not just one domain; (2) each agent inherently knows its own scope boundaries and can correctly identify when to hand off; (3) routing logic is distributed by nature, not by choice — a central supervisor would need to replicate each agent's domain knowledge to route correctly. If all three aren't true, use supervisor.

The multi-agent overview covers the full pattern-selection decision tree

The Multi-Agent Systems overview article includes a pattern selector diagram that maps your specific signals (tool confusion, parallel subtasks, domain boundaries) to the right pattern. Start there before choosing swarm.