LangChain/Tools
Advanced12 min

Parallel Tool Calling

Parallel tool calling lets a model request multiple independent tools in one response instead of one at a time. This article covers when it saves you tokens and latency, when it causes race conditions, how to configure it across providers, and what production failure looks like.

Quick Reference

  • Parallel tool calling: the model returns multiple tool_calls in one assistant message. You execute them concurrently, return all results at once, and the model generates its final response in a single second LLM call.
  • Round-trip savings: 3 sequential tool calls = 4 LLM turns. 3 parallel tool calls = 2 LLM turns. That's 2 fewer round trips, each of which re-sends the growing message history.
  • Token savings: with 3 tools and ~150-token results each, parallel uses roughly 2000 input tokens vs 4000 sequential. With 5 tools the gap reaches ~67% savings.
  • Provider support: GPT-5.4, Claude Opus 4.7 / Sonnet 4.6, and Gemini 3.1 Pro all support parallel calls by default. Reasoning models (o3, o4-mini) often do not — test explicitly.
  • Disable for side effects: if tool A creates a record and tool B updates that same record, parallel execution causes a silent race condition. Disable or use sequential instructions.
  • LangGraph's ToolNode handles parallel execution automatically — asyncio.gather(), ToolMessage wrapping, error isolation, and tool_call_id matching.
  • Verify it's working: log len(ai_message.tool_calls) after the first LLM turn. If it's always 1, your model or configuration is forcing sequential behavior.

What Parallel Tool Calling Is (and Isn't)

In standard tool calling, the model generates one tool call, waits for the result, then generates the next call. This is sequential: every tool is its own round trip. Parallel tool calling lets the model batch multiple independent tool calls into a single assistant message. Your runtime executes all of them concurrently, returns all results at once, and the model generates its final response — two LLM turns total instead of four.

The model doesn't execute the tools — it decides which tools to call and with what arguments. Your code executes them. Parallel calling changes what the model returns (multiple tool_calls in one message), not how tools are implemented. The tool functions themselves don't change.

Model returns all three tool calls in one response — they're independent

same 3 tools — sequential vs parallel execution

sequential — 3 round trips
LLM call #1
get_weather()

wait ~1s

result ✓

returned to LLM

LLM call #2
get_time()

wait ~1s

result ✓

returned to LLM

LLM call #3
get_exchange_rate()

wait ~1s

result ✓

returned to LLM

final response

~3s total

parallel — 1 round trip
LLM call #1

asyncio.gather()

get_weather()
get_time()
get_exchange_rate()

all run concurrently — wait ~1s

all 3 results returned at once

get_weather()
get_time()
get_exchange_rate()
final response

~1s total

parallel saves 2 LLM round trips — tools must be independent (no shared state)