Writing Effective Tool Descriptions
Tool descriptions are serialized into every request as the model's only guide for tool selection. This article covers the full anatomy of a production-grade description, the token cost math, disambiguation patterns, schema enforcement with strict mode, and how to measure and debug tool selection quality.
Quick Reference
- →Tool descriptions are serialized into every request as JSON schema — the model reads them at every turn, not just at setup
- →The Anthropic API adds 346 tokens of base overhead per request for tool use; each tool schema adds ~100–300 tokens on top
- →A production-grade description has 6 components: name, purpose, USE WHEN, DO NOT USE WHEN, parameters, and returns
- →DO NOT USE WHEN blocks prevent more wrong calls than positive examples alone — add one to every tool that has a similar counterpart
- →Add strict: true to tool definitions (Anthropic and OpenAI) to prevent hallucinated parameters at no accuracy cost
- →Selection accuracy degrades at 15+ tools regardless of description quality — use deferred loading or prefix-grouped categorization
- →Measure tool selection accuracy with a test suite before shipping — fix descriptions before blaming the model
Why Tool Descriptions Are Your Highest-Leverage Investment
Most engineers treat tool descriptions as lightweight config — a docstring to fill in before moving on. They are not. Every tool definition is serialized into the system prompt and sent with every API request. The description field is the model's only signal for deciding which tool to call, when to call it, and what arguments to pass. Writing a vague description is equivalent to shipping an API without documentation — except the consumer never crashes, it just quietly uses the wrong endpoint.
Directly: vague descriptions are often longer but less useful — more tokens for the same low quality. Indirectly: every wrong tool selection produces an incorrect observation that the model tries to recover from, at full turn cost. A 5% misrouting rate at 10K turns/day means 500 extra turns — plus the cost of wrong answers reaching users.
tool schemas are paid upfront on every call, even when no tool is invoked