Advanced18 min

Tool Design Patterns

How tool names, descriptions, schemas, and examples influence model selection accuracy. Covers the full production toolkit: namespacing, tool use examples, schema design, error surfaces, scaling with Tool RAG, token cost math, and a concrete eval methodology.

Quick Reference

→Tool naming: short, verb-first names (search_docs, create_ticket) outperform vague names (handle_request). The model reads the name before the description.
→Namespacing: use service+resource prefixes (asana_projects_search, jira_issues_create) to help the model distinguish tools across integrated services.
→Descriptions: 3-4 sentences — what it does, when to use it, when NOT to use it. Negative examples are load-bearing; don't cut them.
→Tool use examples: embedding 2-3 concrete invocation examples in the description improved accuracy from 72% to 90% on complex parameter handling (Anthropic internal benchmark).
→Schema design: use enum types instead of describing valid values in prose. Enums appear in the schema the model reads — text descriptions require the model to parse and apply them.
→Tool RAG: for 20+ tools, dynamically select a relevant subset per query instead of sending all tools each turn. Anthropic's Tool Search Tool achieved 85% token reduction and improved accuracy by 25-49 points.
→Token cost math: tool descriptions appear in input on every LLM call. 10 tools × 150 tokens × 20 turns = 30K tokens per conversation in overhead. Compute this before optimizing blindly.

When NOT to Redesign Your Tools

Tool design is not always the bottleneck. Before spending time on descriptions and schemas, confirm that tool design is actually what's failing. Redesigning tools when the problem is the system prompt, the model, or the retrieval layer wastes time and introduces unrelated regressions.

Signals that tool design IS the problem

The model calls the wrong tool for a query where the right tool is obvious. The model calls the right tool but passes malformed arguments. The model hallucinates a tool call to a tool that doesn't exist. The model skips tool use entirely when it should call one. These are tool design failures. If instead the model reasons correctly but the answers are wrong, or it loops indefinitely — those are system prompt, eval, or logic failures. Don't redesign the tools.

Match the symptom to the fix — each technique targets a different failure mode

▸Wrong tool selected on clear queries → start with naming and descriptions (sections 2–3).
▸Right tool, wrong arguments → add tool use examples and schema enums (sections 3–4).
▸Occasional misselection only on edge cases → add negative examples to descriptions.
▸Accuracy degrades as tool count grows past 20 → implement Tool RAG (section 7).
▸Context budget spiking → compute the token cost math (section 8) before trimming descriptions blindly.

Naming and Namespacing

The model reads the tool name before the description. A clear name sets the initial hypothesis; the description confirms or corrects it. Clear, verb-first names like search_documents, create_ticket, and get_weather dramatically outperform vague names like process, handle, or utility.

Description Engineering

Descriptions are prompts for tool selection

The tool description tells the model when and how to use the tool. It should answer: what does this tool do, when should the model use it, and what does it return. Aim for 3-4 sentences. One-line descriptions ('Searches documents') are almost always too short — they leave the model without a decision rule.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.