Writing Effective Tool Descriptions

Tool descriptions are serialized into every request as the model's only guide for tool selection. This article covers the full anatomy of a production-grade description, the token cost math, disambiguation patterns, schema enforcement with strict mode, and how to measure and debug tool selection quality.

Quick Reference

→Tool descriptions are serialized into every request as JSON schema — the model reads them at every turn, not just at setup
→The Anthropic API adds 346 tokens of base overhead per request for tool use; each tool schema adds ~100–300 tokens on top
→A production-grade description has 6 components: name, purpose, USE WHEN, DO NOT USE WHEN, parameters, and returns
→DO NOT USE WHEN blocks prevent more wrong calls than positive examples alone — add one to every tool that has a similar counterpart
→Add strict: true to tool definitions (Anthropic and OpenAI) to prevent hallucinated parameters at no accuracy cost
→Selection accuracy degrades at 15+ tools regardless of description quality — use deferred loading or prefix-grouped categorization
→Measure tool selection accuracy with a test suite before shipping — fix descriptions before blaming the model

Why Tool Descriptions Are Your Highest-Leverage Investment

Most engineers treat tool descriptions as lightweight config — a docstring to fill in before moving on. They are not. Every tool definition is serialized into the system prompt and sent with every API request. The description field is the model's only signal for deciding which tool to call, when to call it, and what arguments to pass. Writing a vague description is equivalent to shipping an API without documentation — except the consumer never crashes, it just quietly uses the wrong endpoint.

Token overhead formula — compute this before binding tools

Bad descriptions cost twice

Directly: vague descriptions are often longer but less useful — more tokens for the same low quality. Indirectly: every wrong tool selection produces an incorrect observation that the model tries to recover from, at full turn cost. A 5% misrouting rate at 10K turns/day means 500 extra turns — plus the cost of wrong answers reaching users.

tool schemas are paid upfront on every call, even when no tool is invoked

What the Model Actually Sees

When you bind tools to a model, every tool definition is serialized into the system prompt as a JSON schema. The model sees this at the start of every turn — not your Python source code, not your decorator, not your type hints. Only the JSON schema. Understanding this mechanical reality is the foundation of writing descriptions that work.

Anatomy of an Effective Tool Description

A complete tool description has exactly six components. Skip any one and you create a specific, predictable failure mode. The checklist below maps each component to its role in the model's selection process.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.