LLM Foundations/Prompt Engineering as a Discipline

★ OverviewBeginner14 min

Prompt Anatomy

The structural components of an LLM prompt and why structure matters more than length. Covers the four message roles (including OpenAI's developer role), when to skip elaborate structure, how system prompts interact with prompt caching costs, and the production failure modes that vague prompts cause.

Quick Reference

→System prompt: role, constraints, output format — highest-priority instructions
→Developer role (OpenAI o1+): sits between system and user, use for business logic that differs per call
→User message: the actual task, input data, or question
→Assistant message: model's previous responses — use for few-shot examples and conversation history
→Prefilling is deprecated on Claude 4.6+ — returns a 400 error; use structured outputs instead
→Longer stable system prompts save money: prompt caching cuts repeated input cost by ~90%
→XML-style delimiters (<document>, <instructions>) prevent data-instruction confusion
→Structure matters most for extraction, classification, and multi-step tasks — skip it for simple lookups

In this article

1.The Four Message Roles
2.When Prompt Structure Matters (and When It Doesn't)
3.Designing System Prompts
4.Structuring User Messages
5.Few-Shot Examples That Work
6.Prompt Structure Meets Prompt Caching
7.How Prompts Fail in Production
8.Production Prompt Template
★Best Practices
✓Key Takeaways

The Four Message Roles

Modern LLM APIs structure conversations as ordered message arrays. Each message has a role that determines how the model treats its content. Understanding the roles — and their priority differences — is the single most leveraged skill in prompt engineering.

Five layers that compose a well-structured agent prompt

Role	Provider	Purpose	Priority
system	Anthropic, OpenAI (legacy)	Persona, constraints, output format, guardrails	Highest — architecturally separate from conversation
developer	OpenAI o1+ models	Business logic and instructions that vary per deployment but not per user	High — replaces system in o1+ models
user	All providers	The actual task, question, or input data	Standard
assistant	All providers	Model's previous responses; few-shot examples	Context only — model treats these as its own prior outputs

OpenAI's developer role

In o1 and newer OpenAI models, the 'system' role is replaced by 'developer'. The developer message sits above user messages in the trust hierarchy and should contain instructions the operator sets, not end-user input. If you're building on gpt-5.4-mini or gpt-5.4 and your system message is silently ignored, switch to 'developer'.

Prefilling is gone on Claude 4.6+

Prefilling — placing a partial assistant message as the last array item to steer output format — returns a 400 error on Claude Opus 4.6 and newer. Migrate to: (1) structured outputs via output_config.format, (2) explicit format instructions in the system prompt, or (3) JSON mode. Existing messages mid-conversation are not affected.

When Prompt Structure Matters (and When It Doesn't)

Elaborate prompt structure has a setup cost: more tokens, more maintenance, more ways to introduce contradictions. It's worth paying when the task is complex enough that a vague prompt produces inconsistent output. It's not worth paying for simple queries.

Task type	Structure needed?	Why
'What is 2+2?'	No	Model behavior is deterministic regardless of framing
'Summarize this article'	Minimal	One or two format constraints in user message is enough
'Classify support tickets into 8 categories'	Yes — system + few-shot	Without examples, category boundaries are ambiguous
'Extract 12 fields from legal contracts'	Yes — system + XML delimiters + output schema	Without structure, field names vary and fields get hallucinated
Multi-turn customer support agent	Yes — full production template	Persona, guardrails, escalation rules, and format must be locked

The consistency test

Run your prompt on 10 different inputs and look for variance in the output format, not just content. If format varies — response in JSON sometimes, prose sometimes — you need more structure. If format is consistent but content quality varies, more examples or constraints are the fix.

Designing System Prompts

The system prompt is your primary behavioral lever. It defines who the model is, what constraints it operates under, and what its output must look like. A poorly structured system prompt makes every downstream call unpredictable.

Production system prompt — Anthropic SDK

Same pattern — OpenAI SDK (gpt-5.4-mini, developer role)

▸Lead with role definition: who is the model and what is its expertise?
▸Specify constraints explicitly: what must the model NOT do?
▸Define output format: structure, length, style — don't assume the model will infer it
▸Include guardrails: safety boundaries and redirect behavior for off-topic inputs
▸Use markdown headers (##) to organise long system prompts — models respect structure in their own instructions

Structuring User Messages

The user message contains the actual task. Its most common failure mode is mixing instructions with data — the model can't tell where the instructions end and the input begins, so it applies instructions to the wrong thing or ignores them.

Data-instruction separation with XML delimiters

Use XML-style delimiters

Wrapping sections in <document>, <instructions>, <examples> tags helps the model distinguish instructions from input data. This is critical when user-provided data might contain text that looks like instructions — a basic prompt injection defence. Claude has particularly strong adherence to XML-delimited sections.

Few-Shot Examples That Work

Few-shot examples in assistant messages teach the output format by demonstration. This is more reliable than format instructions alone because you're showing the exact JSON structure, edge-case handling, and reasoning style you want — not describing it.

Few-shot classification — OpenAI SDK

Same few-shot pattern — Anthropic SDK

▸3-5 examples cover most classification and extraction tasks — more gives diminishing returns
▸Include at least one example per output category and one edge case
▸Example quality matters more than quantity — use your best, cleanest outputs
▸For multi-turn conversation history, summarise older messages rather than including the full transcript

Prompt Structure Meets Prompt Caching

Prompt caching changes the cost calculus for system prompts. With caching, long stable prefixes — system prompt plus few-shot examples — are stored after the first call. Subsequent calls pay ~0.1× the normal input price for the cached portion. This means a longer, more detailed system prompt can be cheaper in production than a short one, once you amortise the write cost.

Static layers cache at 0.1× cost · Dynamic layers always pay full price

Enabling prompt caching on Claude — mark the cache boundary

Cache hit requires an identical prefix

The cache key is the exact token sequence up to the cache_control boundary. Any change — even whitespace — busts the cache. Keep your system prompt and few-shot examples in a versioned constant; don't assemble them dynamically per-call. Dynamic context (RAG chunks, user profile) should always come after the cache boundary.

How Prompts Fail in Production

The most expensive prompt bugs aren't syntax errors — they're semantic failures that look correct in testing but degrade quietly in production. These are the patterns that account for the majority of prompt-related production incidents.

Failure	Symptom	Root cause	Fix
Data-instruction confusion	Model treats input text as a command	No delimiter between instructions and data	Wrap data in <document> or similar XML tags
Contradictory constraints	Model behaves inconsistently across calls	System prompt says 'be concise' and 'be thorough' simultaneously	Audit for contradictions; pick one or specify when each applies
Role priority violation	User can override system constraints via prompt injection	Critical guardrails placed in user message, not system prompt	Guardrails belong in system/developer role only
Silent format drift	Output format gradually shifts across conversation turns	Format instructions only in system prompt, not reinforced	Add brief format reminder to complex user messages
Context boundary leak	Model cites information from previous conversations	Assistant messages from different sessions injected as context	Scope conversation history to the current session only

Real project

A customer support bot was returning billing information to the wrong users. Root cause: the system prompt said 'only share account info for the authenticated user', but the user message template included a <context> block with the previous conversation from a different session — injected by a bug in session management. The model followed its instructions correctly; the context it was given was wrong. Prompt structure created a false sense of security that masked the data isolation bug for three days.

System prompt is not a security boundary

System prompt instructions reduce prompt injection risk but do not eliminate it. A sufficiently crafted user input can cause the model to ignore system instructions in some models under some conditions. Never use the system prompt as a substitute for access control, authentication, or data isolation in your application layer.

Production Prompt Template

A prompt builder that handles model tiering, caching, and the correct role for each provider. Copy and adapt this as a starting point.

Runnable production prompt builder — Anthropic + OpenAI

Best Practices

✓Put all behavioral constraints in the system/developer prompt — it has the highest trust level
✓Use XML-style tags (<document>, <instructions>, <examples>) to separate data from instructions
✓Include 3-5 few-shot examples for classification and extraction tasks — cover edge cases
✓Structure system prompts with clear sections: role, constraints, output format, guardrails
✓Enable prompt caching on stable system prompts — you'll cut repeated input costs by ~90%
✓Keep the cache prefix (system prompt + static examples) identical across calls — any change busts the cache
✓Test prompts on at least 10 diverse inputs before shipping — check for format variance, not just accuracy
✓Use the developer role (not system) on OpenAI o1+ models — system messages are silently deprioritised
✓Specify the exact output schema in the prompt — don't describe the format, show it
✓Put dynamic context (RAG chunks, user data) after the cache boundary so the static prefix stays cacheable

Don’t

✗Don't put security guardrails in user messages — they can be overridden by adversarial input
✗Don't write contradictory constraints (e.g., 'be concise' and 'be thorough') — pick one or scope each
✗Don't rely on prefilling to steer output format on Claude 4.6+ — it returns a 400 error
✗Don't mix instructions and input data without delimiters — the model can't tell where one ends
✗Don't assemble the system prompt dynamically per-call if you rely on caching — dynamic construction busts the cache
✗Don't treat system prompt instructions as a security boundary — they reduce injection risk, not eliminate it
✗Don't use more than 10 few-shot examples without testing whether fine-tuning is cheaper
✗Don't add format instructions only once and assume they'll hold across a long multi-turn conversation
✗Don't include irrelevant conversation history — scope context to the current session only
✗Don't write unstructured wall-of-text system prompts — use headers, bullets, and labelled sections

Key Takeaways

✓System and developer prompts are architecturally privileged — all behavioral constraints belong there, not in user messages.
✓OpenAI o1+ models use a 'developer' role instead of 'system' — mixing them up causes instructions to be silently deprioritised.
✓Prefilling is deprecated on Claude 4.6+ and returns a 400 error — use structured outputs or system prompt format instructions instead.
✓XML delimiters (<document>, <instructions>) are the primary defence against data-instruction confusion and basic prompt injection.
✓Prompt caching makes longer stable system prompts cheaper in production — 1,000 tokens × 1,000 calls drops from $3.00 to ~$0.30 at Sonnet 4.6 pricing.
✓Few-shot examples teach format by demonstration — 3-5 examples covering edge cases outperform verbose format descriptions.

Video on this topic

The anatomy of a perfect AI prompt

tiktok

←

Multimodal Models

Techniques That Work

→