Parallel Tool Calling
How parallel tool calling works across providers, executing concurrent tool calls with asyncio, handling dependencies and failures, and optimizing the tradeoff between round trips and token cost.
Quick Reference
- →Parallel tool calling: the model requests multiple tools in a single response. Instead of call-wait-call-wait, it says 'call A and B simultaneously.'
- →Provider support: OpenAI (GPT-5.4), Anthropic (Claude Sonnet 4.6+), and Google (Gemini 2.5+) all support parallel tool calls. Older models do not.
- →Execution: use asyncio.gather() to run parallel tool calls concurrently. A 3-tool parallel call that takes 1s each finishes in ~1s, not ~3s.
- →Dependency resolution: if tool B needs the result of tool A, the model must call them sequentially (two turns). True parallel calls are always independent.
- →Error handling: if one parallel call fails, return its error alongside successful results. Do not fail the entire batch — the model can reason about partial results.
How Parallel Tool Calling Works
In standard tool calling, the model generates one tool call, waits for the result, then generates the next call or a response. This is sequential: each tool call is a full round trip to the LLM. Parallel tool calling lets the model request multiple independent tools in a single response. The runtime executes all of them concurrently, returns all results at once, and the model generates its final response. This reduces both latency (tools run concurrently) and LLM round trips (one turn instead of N).
Parallel tool calling requires explicit model support. The model must be trained to generate multiple tool_call entries in a single assistant message. OpenAI GPT-5.4, Claude Sonnet 4.6/Opus 4.6, and Gemini 3.1 Pro all support it. Older models (GPT-3.5, Claude 3 Haiku) may not. Check provider docs and test before depending on it.
The diagram below shows the same three tools executed both ways — sequential (3 LLM round trips, ~3s) vs parallel (1 round trip, ~1s):
same 3 tools — sequential vs parallel execution
wait ~1s
returned to LLM
wait ~1s
returned to LLM
wait ~1s
returned to LLM
~3s total
asyncio.gather()
all run concurrently — wait ~1s
all 3 results returned at once
~1s total
parallel saves 2 LLM round trips — tools must be independent (no shared state)