Agent Prompt Design

Agent system prompts are operating contracts, not personality descriptions. This article covers how to structure them with XML tags, write tool usage rules that actually enforce behavior, defend against prompt injection, use adaptive thinking correctly, and build a prompt evaluation harness that gates every change.

Quick Reference

→Use three sections wrapped in XML tags: <identity>, <tool_usage_rules>, and <boundaries>
→Write tool rules in two halves: USE WHEN and DO NOT USE WHEN — ambiguous descriptions cause random tool selection
→Wrap all user input in <user_input> tags and instruct the agent to treat it as data, not executable instructions
→Claude 4.6+ uses adaptive thinking natively — set effort to 'xhigh' for multi-tool agents instead of writing manual CoT
→Split prompts into a static base (cache_control: ephemeral) and dynamic context injected per request — static base is cached ~90% of calls
→Prompt caching cuts repeated system prompt cost by ~72% — critical when the prompt is sent on every tool call in the ReAct loop
→Baseline before you change: measure tool accuracy, task completion, and cost per request before any prompt edit
→Never ship a prompt change without running the eval suite — a 'better' prompt that fails the eval is a regression

When NOT to Redesign Your Agent Prompt

Before touching the system prompt, run this diagnostic: give the agent the same input 5 times. If it selects different tools across runs, the problem is in the **tool descriptions** — not the system prompt. If it consistently picks the right tool but sends wrong parameters, it's a **tool schema** problem. Only if behavior is consistently wrong *across all tools* — wrong persona, wrong stop conditions, no escalation — is it a system prompt problem. Most 'my agent is broken' reports are tool description issues.

Symptom	Root cause	Fix
Agent selects different tools on the same input	Ambiguous tool descriptions	Rewrite tool USE WHEN / DO NOT USE WHEN
Agent sends malformed tool parameters	Missing parameter descriptions	Add type + format + example to tool schema
Agent stops too early or loops forever	Missing stop conditions	Add explicit stop rules to <boundaries>
Agent never escalates to a human	Missing escalation triggers	Add ESCALATE WHEN clause to <tool_usage_rules>
Agent's tone or persona is inconsistent	Vague <identity> section	Rewrite <identity> with concrete tone examples

Start with tool descriptions, not the system prompt

Tool descriptions are sent inside every tool definition object — they're separate from the system prompt and control per-tool behavior. Rewriting the system prompt to fix tool selection issues almost never helps. Fix the tool description first.

Anatomy of an Agent System Prompt

Five layers that compose a well-structured agent prompt

Writing Tool Usage Rules That Actually Work

The single largest reliability lever is tool usage rules. Not the identity section. Not chain-of-thought. The USE WHEN / DO NOT USE WHEN pattern works because it gives the model an explicit decision tree instead of forcing it to infer intent from a vague description. Each tool needs both halves.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.