Prompt Anatomy
The structural components of an LLM prompt and why structure matters more than length. Covers the four message roles (including OpenAI's developer role), when to skip elaborate structure, how system prompts interact with prompt caching costs, and the production failure modes that vague prompts cause.
Quick Reference
- →System prompt: role, constraints, output format — highest-priority instructions
- →Developer role (OpenAI o1+): sits between system and user, use for business logic that differs per call
- →User message: the actual task, input data, or question
- →Assistant message: model's previous responses — use for few-shot examples and conversation history
- →Prefilling is deprecated on Claude 4.6+ — returns a 400 error; use structured outputs instead
- →Longer stable system prompts save money: prompt caching cuts repeated input cost by ~90%
- →XML-style delimiters (<document>, <instructions>) prevent data-instruction confusion
- →Structure matters most for extraction, classification, and multi-step tasks — skip it for simple lookups
In this article
The Four Message Roles
Modern LLM APIs structure conversations as ordered message arrays. Each message has a role that determines how the model treats its content. Understanding the roles — and their priority differences — is the single most leveraged skill in prompt engineering.
Five layers that compose a well-structured agent prompt
| Role | Provider | Purpose | Priority |
|---|---|---|---|
| system | Anthropic, OpenAI (legacy) | Persona, constraints, output format, guardrails | Highest — architecturally separate from conversation |
| developer | OpenAI o1+ models | Business logic and instructions that vary per deployment but not per user | High — replaces system in o1+ models |
| user | All providers | The actual task, question, or input data | Standard |
| assistant | All providers | Model's previous responses; few-shot examples | Context only — model treats these as its own prior outputs |
In o1 and newer OpenAI models, the 'system' role is replaced by 'developer'. The developer message sits above user messages in the trust hierarchy and should contain instructions the operator sets, not end-user input. If you're building on gpt-5.4-mini or gpt-5.4 and your system message is silently ignored, switch to 'developer'.
Prefilling — placing a partial assistant message as the last array item to steer output format — returns a 400 error on Claude Opus 4.6 and newer. Migrate to: (1) structured outputs via output_config.format, (2) explicit format instructions in the system prompt, or (3) JSON mode. Existing messages mid-conversation are not affected.
When Prompt Structure Matters (and When It Doesn't)
Elaborate prompt structure has a setup cost: more tokens, more maintenance, more ways to introduce contradictions. It's worth paying when the task is complex enough that a vague prompt produces inconsistent output. It's not worth paying for simple queries.
| Task type | Structure needed? | Why |
|---|---|---|
| 'What is 2+2?' | No | Model behavior is deterministic regardless of framing |
| 'Summarize this article' | Minimal | One or two format constraints in user message is enough |
| 'Classify support tickets into 8 categories' | Yes — system + few-shot | Without examples, category boundaries are ambiguous |
| 'Extract 12 fields from legal contracts' | Yes — system + XML delimiters + output schema | Without structure, field names vary and fields get hallucinated |
| Multi-turn customer support agent | Yes — full production template | Persona, guardrails, escalation rules, and format must be locked |
Run your prompt on 10 different inputs and look for variance in the output format, not just content. If format varies — response in JSON sometimes, prose sometimes — you need more structure. If format is consistent but content quality varies, more examples or constraints are the fix.
Designing System Prompts
The system prompt is your primary behavioral lever. It defines who the model is, what constraints it operates under, and what its output must look like. A poorly structured system prompt makes every downstream call unpredictable.
- ▸Lead with role definition: who is the model and what is its expertise?
- ▸Specify constraints explicitly: what must the model NOT do?
- ▸Define output format: structure, length, style — don't assume the model will infer it
- ▸Include guardrails: safety boundaries and redirect behavior for off-topic inputs
- ▸Use markdown headers (##) to organise long system prompts — models respect structure in their own instructions
Structuring User Messages
The user message contains the actual task. Its most common failure mode is mixing instructions with data — the model can't tell where the instructions end and the input begins, so it applies instructions to the wrong thing or ignores them.
Wrapping sections in <document>, <instructions>, <examples> tags helps the model distinguish instructions from input data. This is critical when user-provided data might contain text that looks like instructions — a basic prompt injection defence. Claude has particularly strong adherence to XML-delimited sections.
Few-Shot Examples That Work
Few-shot examples in assistant messages teach the output format by demonstration. This is more reliable than format instructions alone because you're showing the exact JSON structure, edge-case handling, and reasoning style you want — not describing it.
- ▸3-5 examples cover most classification and extraction tasks — more gives diminishing returns
- ▸Include at least one example per output category and one edge case
- ▸Example quality matters more than quantity — use your best, cleanest outputs
- ▸For multi-turn conversation history, summarise older messages rather than including the full transcript
Prompt Structure Meets Prompt Caching
Prompt caching changes the cost calculus for system prompts. With caching, long stable prefixes — system prompt plus few-shot examples — are stored after the first call. Subsequent calls pay ~0.1× the normal input price for the cached portion. This means a longer, more detailed system prompt can be cheaper in production than a short one, once you amortise the write cost.
Static layers cache at 0.1× cost · Dynamic layers always pay full price
The cache key is the exact token sequence up to the cache_control boundary. Any change — even whitespace — busts the cache. Keep your system prompt and few-shot examples in a versioned constant; don't assemble them dynamically per-call. Dynamic context (RAG chunks, user profile) should always come after the cache boundary.
How Prompts Fail in Production
The most expensive prompt bugs aren't syntax errors — they're semantic failures that look correct in testing but degrade quietly in production. These are the patterns that account for the majority of prompt-related production incidents.
| Failure | Symptom | Root cause | Fix |
|---|---|---|---|
| Data-instruction confusion | Model treats input text as a command | No delimiter between instructions and data | Wrap data in <document> or similar XML tags |
| Contradictory constraints | Model behaves inconsistently across calls | System prompt says 'be concise' and 'be thorough' simultaneously | Audit for contradictions; pick one or specify when each applies |
| Role priority violation | User can override system constraints via prompt injection | Critical guardrails placed in user message, not system prompt | Guardrails belong in system/developer role only |
| Silent format drift | Output format gradually shifts across conversation turns | Format instructions only in system prompt, not reinforced | Add brief format reminder to complex user messages |
| Context boundary leak | Model cites information from previous conversations | Assistant messages from different sessions injected as context | Scope conversation history to the current session only |
A customer support bot was returning billing information to the wrong users. Root cause: the system prompt said 'only share account info for the authenticated user', but the user message template included a <context> block with the previous conversation from a different session — injected by a bug in session management. The model followed its instructions correctly; the context it was given was wrong. Prompt structure created a false sense of security that masked the data isolation bug for three days.
System prompt instructions reduce prompt injection risk but do not eliminate it. A sufficiently crafted user input can cause the model to ignore system instructions in some models under some conditions. Never use the system prompt as a substitute for access control, authentication, or data isolation in your application layer.
Production Prompt Template
A prompt builder that handles model tiering, caching, and the correct role for each provider. Copy and adapt this as a starting point.
Best Practices
Do
- ✓Put all behavioral constraints in the system/developer prompt — it has the highest trust level
- ✓Use XML-style tags (<document>, <instructions>, <examples>) to separate data from instructions
- ✓Include 3-5 few-shot examples for classification and extraction tasks — cover edge cases
- ✓Structure system prompts with clear sections: role, constraints, output format, guardrails
- ✓Enable prompt caching on stable system prompts — you'll cut repeated input costs by ~90%
- ✓Keep the cache prefix (system prompt + static examples) identical across calls — any change busts the cache
- ✓Test prompts on at least 10 diverse inputs before shipping — check for format variance, not just accuracy
- ✓Use the developer role (not system) on OpenAI o1+ models — system messages are silently deprioritised
- ✓Specify the exact output schema in the prompt — don't describe the format, show it
- ✓Put dynamic context (RAG chunks, user data) after the cache boundary so the static prefix stays cacheable
Don’t
- ✗Don't put security guardrails in user messages — they can be overridden by adversarial input
- ✗Don't write contradictory constraints (e.g., 'be concise' and 'be thorough') — pick one or scope each
- ✗Don't rely on prefilling to steer output format on Claude 4.6+ — it returns a 400 error
- ✗Don't mix instructions and input data without delimiters — the model can't tell where one ends
- ✗Don't assemble the system prompt dynamically per-call if you rely on caching — dynamic construction busts the cache
- ✗Don't treat system prompt instructions as a security boundary — they reduce injection risk, not eliminate it
- ✗Don't use more than 10 few-shot examples without testing whether fine-tuning is cheaper
- ✗Don't add format instructions only once and assume they'll hold across a long multi-turn conversation
- ✗Don't include irrelevant conversation history — scope context to the current session only
- ✗Don't write unstructured wall-of-text system prompts — use headers, bullets, and labelled sections
Key Takeaways
- ✓System and developer prompts are architecturally privileged — all behavioral constraints belong there, not in user messages.
- ✓OpenAI o1+ models use a 'developer' role instead of 'system' — mixing them up causes instructions to be silently deprioritised.
- ✓Prefilling is deprecated on Claude 4.6+ and returns a 400 error — use structured outputs or system prompt format instructions instead.
- ✓XML delimiters (<document>, <instructions>) are the primary defence against data-instruction confusion and basic prompt injection.
- ✓Prompt caching makes longer stable system prompts cheaper in production — 1,000 tokens × 1,000 calls drops from $3.00 to ~$0.30 at Sonnet 4.6 pricing.
- ✓Few-shot examples teach format by demonstration — 3-5 examples covering edge cases outperform verbose format descriptions.
Video on this topic
The anatomy of a perfect AI prompt
tiktok