Advanced12 min

Parallel Tool Calling

Parallel tool calling lets a model request multiple independent tools in one response instead of one at a time. This article covers when it saves you tokens and latency, when it causes race conditions, how to configure it across providers, and what production failure looks like.

Quick Reference

→Parallel tool calling: the model returns multiple tool_calls in one assistant message. You execute them concurrently, return all results at once, and the model generates its final response in a single second LLM call.
→Round-trip savings: 3 sequential tool calls = 4 LLM turns. 3 parallel tool calls = 2 LLM turns. That's 2 fewer round trips, each of which re-sends the growing message history.
→Token savings: with 3 tools and ~150-token results each, parallel uses roughly 2000 input tokens vs 4000 sequential. With 5 tools the gap reaches ~67% savings.
→Provider support: GPT-5.4, Claude Opus 4.7 / Sonnet 4.6, and Gemini 3.1 Pro all support parallel calls by default. Reasoning models (o3, o4-mini) often do not — test explicitly.
→Disable for side effects: if tool A creates a record and tool B updates that same record, parallel execution causes a silent race condition. Disable or use sequential instructions.
→LangGraph's ToolNode handles parallel execution automatically — asyncio.gather(), ToolMessage wrapping, error isolation, and tool_call_id matching.
→Verify it's working: log len(ai_message.tool_calls) after the first LLM turn. If it's always 1, your model or configuration is forcing sequential behavior.

What Parallel Tool Calling Is (and Isn't)

In standard tool calling, the model generates one tool call, waits for the result, then generates the next call. This is sequential: every tool is its own round trip. Parallel tool calling lets the model batch multiple independent tool calls into a single assistant message. Your runtime executes all of them concurrently, returns all results at once, and the model generates its final response — two LLM turns total instead of four.

The model doesn't execute the tools — it decides which tools to call and with what arguments. Your code executes them. Parallel calling changes what the model returns (multiple tool_calls in one message), not how tools are implemented. The tool functions themselves don't change.

Model returns all three tool calls in one response — they're independent

same 3 tools — sequential vs parallel execution

sequential — 3 round trips

LLM call #1

get_weather()

wait ~1s

result ✓

returned to LLM

LLM call #2

get_time()

wait ~1s

result ✓

returned to LLM

LLM call #3

get_exchange_rate()

wait ~1s

result ✓

returned to LLM

final response

~3s total

parallel — 1 round trip

LLM call #1

asyncio.gather()

get_weather()

get_time()

get_exchange_rate()

all run concurrently — wait ~1s

all 3 results returned at once

✓get_weather()

✓get_time()

✓get_exchange_rate()

final response

~1s total

parallel saves 2 LLM round trips — tools must be independent (no shared state)

Should You Enable Parallel Tool Calling?

Parallel calling is on by default for most frontier models, so the real question is: when should you disable it? Two conditions force sequential execution. First, if any tools share state or have ordered side effects (create then update, read then delete), executing them simultaneously causes race conditions that don't throw exceptions — they corrupt data silently. Second, if any tool's input depends on another tool's output, the model will naturally serialize them across turns. Parallel calling only fires when inputs are independent.

Provider Support and Configuration

Provider	Models	Parallel support	How to disable
OpenAI	GPT-5.4, GPT-5.4 mini	Yes (default on)	parallel_tool_calls=False in bind_tools()
Anthropic	Claude Opus 4.7, Sonnet 4.6	Yes (default on)	tool_choice={"type": "auto", "disable_parallel_tool_use": True}
Google	Gemini 3.1 Pro, Gemini 3 Flash	Yes (default on)	Automatic — no disable parameter exposed
Mistral	Mistral Large 3	Yes	Automatic
OpenAI Reasoning	o3, o4-mini	Often no	Parallel tool calls frequently unsupported — test explicitly
Local (Ollama)	Depends on model	Rarely	Most open-weight models do not generate parallel calls

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.