Advanced14 min

Swarm & Handoffs

When peer-to-peer agent handoffs earn their complexity, what they cost per handoff depth, how they fail in production, and how to defend against each failure mode. Includes production-grade LangGraph code with checkpointer, context management strategies with cost math, and a sharpened comparison with the supervisor pattern.

Quick Reference

→Swarm = no central supervisor; each agent has handoff tools that transfer control to a peer agent
→create_handoff_tool(agent_name, description) generates the tool; create_swarm(agents, default_active_agent) builds the graph
→Only one agent is active at a time — control transfers when the active agent calls a handoff tool
→Full message history passes to the receiving agent by default — context grows linearly with handoff depth
→OpenAI's experimental Swarm framework (2024) popularized this pattern; it was superseded by the OpenAI Agents SDK in March 2025
→Always compile with a checkpointer for multi-turn production use — without it, each invoke starts from scratch
→Circular handoffs (A → B → A) are the most common production failure — add a depth guard and test handoff chains end-to-end
→Swarm trades audit trail clarity for routing flexibility — if you need a central log of all routing decisions, use supervisor instead

When NOT to Use Swarm

Supervisor is the safer default

Most multi-agent systems don't need swarm. Swarm is a specialization for conversational flows where routing logic is genuinely distributed across agents. The supervisor pattern gives you a centralized audit trail, guaranteed execution order, and easier debugging. Use supervisor unless you have a specific signal that swarm fits.

Signal	What it suggests	Use instead
You need a guaranteed execution order (research → write → review)	Workflow, not conversation	Supervisor or plan-and-execute
You need a single audit trail of all routing decisions	Compliance or debugging priority	Supervisor — it logs every routing decision centrally
One agent should see all agents' outputs before deciding next step	Global coordination needed	Supervisor with shared state
Tasks are independent and can run in parallel	Fan-out, not sequential handoff	Parallelization pattern or Send API
Agents are in different services or organizations	Cross-service boundary	A2A Protocol

Swarm earns its complexity in a narrow set of conditions: (1) the conversation naturally shifts between domains — the user's intent itself requires moving between billing, technical, and account agents, not just one domain; (2) each agent inherently knows its own scope boundaries and can correctly identify when to hand off; (3) routing logic is distributed by nature, not by choice — a central supervisor would need to replicate each agent's domain knowledge to route correctly. If all three aren't true, use supervisor.

The multi-agent overview covers the full pattern-selection decision tree

The Multi-Agent Systems overview article includes a pattern selector diagram that maps your specific signals (tool confusion, parallel subtasks, domain boundaries) to the right pattern. Start there before choosing swarm.

How Swarm Works

Swarm = decentralized agent handoffs

In a swarm, there is no central supervisor. Each agent has handoff tools that transfer control to a peer agent. The currently active agent decides when to hand off and to whom. Only one agent is active at a time.

Building a Production Swarm

Production swarm with create_swarm() — Claude models, checkpointer, thread_id

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.