Advanced16 min

Async Subagents: Background Task Delegation

Async subagents (Deep Agents v0.5) let a supervisor delegate long-running tasks to background agents while continuing to chat with the user. This article covers the decision criteria for when async is worth the complexity, token cost math, production error handling, five concrete failure modes and their defenses, three orchestration patterns with code, and the five metrics you need to monitor before something breaks.

Quick Reference

→Use async subagents when a task takes >10 seconds and the user shouldn't be blocked — anything shorter, sync is simpler
→Correct API: subagents=[AsyncSubAgent(name=..., description=..., graph_id=...)] in create_deep_agent()
→5-tool lifecycle injected per subagent: start_async_task, check_async_task, update_async_task, cancel_async_task, list_async_tasks
→ASGI transport = co-deployed, in-process (default); HTTP transport = remote Agent Server (add url= param)
→Each subagent gets its own context window — parallel N agents means N × system prompt overhead
→Stale task IDs after context compaction: always recover with list_async_tasks() before checking a specific task
→Monitor: task completion rate, duration P95, polling overhead ratio, error rate by subagent, orphaned task count
→Deep Agents v0.5, April 2026 — async subagents are a preview feature; APIs may change

Should I Use Async Subagents?

Async subagents add real complexity: each one injects 5 tools into the supervisor's context, introduces polling logic, and requires lifecycle cleanup. Sync subagents (the task() tool) or a direct tool call are often the right answer. The question to ask first is whether the user actually needs to keep interacting while the work runs.

Supervisor manages background subagents via 5 lifecycle tools — continues chatting while tasks run

Signal	Points to async	Points to sync
Task duration	>15–30 seconds	<10 seconds
User experience	Must keep chatting during work	User can wait for the result
Parallelism	3+ independent subtasks	Sequential or single subtask
Error isolation	One failure shouldn't block others	All-or-nothing is fine
Deployment	Subagent needs independent scaling	Co-deployed is sufficient
State persistence	Work survives supervisor restart	Restart is acceptable

Aspect	Sync subagent (task())	Async subagent
Execution	Blocks supervisor until complete	Runs in background, supervisor continues
User experience	User waits for all subtasks	User chats while tasks run
Tooling	1 tool (task)	5 tools per subagent
Error handling	Error propagates immediately	Supervisor polls status, handles errors
Context overhead	Subagent result in one message	N × system prompts + polling messages

Start with sync, migrate to async

Ship with sync subagents (task() calls) first. Add async only when you have measured evidence that users are waiting too long. The 10-second heuristic is a starting point — your workload may be different.

The 5-Tool Lifecycle

Each AsyncSubAgent you declare gives the supervisor access to 5 tools specific to that subagent. These tools are injected automatically by Deep Agents middleware — you don't write them. The supervisor's LLM decides when to call them based on your system prompt instructions.

What Does It Cost?

The main cost surprise: each async subagent runs in its own context window. That context window includes its own system prompt, its own tool definitions, and its own reasoning tokens — paid separately. N parallel subagents means N system prompt charges, even though they're doing parallel work. This is the trade-off for context isolation.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.