Swarm & Handoffs
When peer-to-peer agent handoffs earn their complexity, what they cost per handoff depth, how they fail in production, and how to defend against each failure mode. Includes production-grade LangGraph code with checkpointer, context management strategies with cost math, and a sharpened comparison with the supervisor pattern.
Quick Reference
- →Swarm = no central supervisor; each agent has handoff tools that transfer control to a peer agent
- →create_handoff_tool(agent_name, description) generates the tool; create_swarm(agents, default_active_agent) builds the graph
- →Only one agent is active at a time — control transfers when the active agent calls a handoff tool
- →Full message history passes to the receiving agent by default — context grows linearly with handoff depth
- →OpenAI's experimental Swarm framework (2024) popularized this pattern; it was superseded by the OpenAI Agents SDK in March 2025
- →Always compile with a checkpointer for multi-turn production use — without it, each invoke starts from scratch
- →Circular handoffs (A → B → A) are the most common production failure — add a depth guard and test handoff chains end-to-end
- →Swarm trades audit trail clarity for routing flexibility — if you need a central log of all routing decisions, use supervisor instead
When NOT to Use Swarm
Most multi-agent systems don't need swarm. Swarm is a specialization for conversational flows where routing logic is genuinely distributed across agents. The supervisor pattern gives you a centralized audit trail, guaranteed execution order, and easier debugging. Use supervisor unless you have a specific signal that swarm fits.
| Signal | What it suggests | Use instead |
|---|---|---|
| You need a guaranteed execution order (research → write → review) | Workflow, not conversation | Supervisor or plan-and-execute |
| You need a single audit trail of all routing decisions | Compliance or debugging priority | Supervisor — it logs every routing decision centrally |
| One agent should see all agents' outputs before deciding next step | Global coordination needed | Supervisor with shared state |
| Tasks are independent and can run in parallel | Fan-out, not sequential handoff | Parallelization pattern or Send API |
| Agents are in different services or organizations | Cross-service boundary | A2A Protocol |
Swarm earns its complexity in a narrow set of conditions: (1) the conversation naturally shifts between domains — the user's intent itself requires moving between billing, technical, and account agents, not just one domain; (2) each agent inherently knows its own scope boundaries and can correctly identify when to hand off; (3) routing logic is distributed by nature, not by choice — a central supervisor would need to replicate each agent's domain knowledge to route correctly. If all three aren't true, use supervisor.
The Multi-Agent Systems overview article includes a pattern selector diagram that maps your specific signals (tool confusion, parallel subtasks, domain boundaries) to the right pattern. Start there before choosing swarm.