LLM Foundations/Prompt Engineering as a Discipline
★ OverviewBeginner14 min

Prompt Anatomy

The structural components of an LLM prompt and why structure matters more than length. Covers the four message roles (including OpenAI's developer role), when to skip elaborate structure, how system prompts interact with prompt caching costs, and the production failure modes that vague prompts cause.

Quick Reference

  • System prompt: role, constraints, output format — highest-priority instructions
  • Developer role (OpenAI o1+): sits between system and user, use for business logic that differs per call
  • User message: the actual task, input data, or question
  • Assistant message: model's previous responses — use for few-shot examples and conversation history
  • Prefilling is deprecated on Claude 4.6+ — returns a 400 error; use structured outputs instead
  • Longer stable system prompts save money: prompt caching cuts repeated input cost by ~90%
  • XML-style delimiters (<document>, <instructions>) prevent data-instruction confusion
  • Structure matters most for extraction, classification, and multi-step tasks — skip it for simple lookups

The Four Message Roles

Modern LLM APIs structure conversations as ordered message arrays. Each message has a role that determines how the model treats its content. Understanding the roles — and their priority differences — is the single most leveraged skill in prompt engineering.

System InstructionsYou are a helpful assistant that...ContextRetrieved docs, user profile, stateExamples (Few-Shot)Input: ... → Output: ...User Message"Summarize these quarterly results"Output FormatRespond as JSON with keys: ...Prompt12345

Five layers that compose a well-structured agent prompt

RoleProviderPurposePriority
systemAnthropic, OpenAI (legacy)Persona, constraints, output format, guardrailsHighest — architecturally separate from conversation
developerOpenAI o1+ modelsBusiness logic and instructions that vary per deployment but not per userHigh — replaces system in o1+ models
userAll providersThe actual task, question, or input dataStandard
assistantAll providersModel's previous responses; few-shot examplesContext only — model treats these as its own prior outputs
OpenAI's developer role

In o1 and newer OpenAI models, the 'system' role is replaced by 'developer'. The developer message sits above user messages in the trust hierarchy and should contain instructions the operator sets, not end-user input. If you're building on gpt-5.4-mini or gpt-5.4 and your system message is silently ignored, switch to 'developer'.

Prefilling is gone on Claude 4.6+

Prefilling — placing a partial assistant message as the last array item to steer output format — returns a 400 error on Claude Opus 4.6 and newer. Migrate to: (1) structured outputs via output_config.format, (2) explicit format instructions in the system prompt, or (3) JSON mode. Existing messages mid-conversation are not affected.

When Prompt Structure Matters (and When It Doesn't)

Elaborate prompt structure has a setup cost: more tokens, more maintenance, more ways to introduce contradictions. It's worth paying when the task is complex enough that a vague prompt produces inconsistent output. It's not worth paying for simple queries.

Task typeStructure needed?Why
'What is 2+2?'NoModel behavior is deterministic regardless of framing
'Summarize this article'MinimalOne or two format constraints in user message is enough
'Classify support tickets into 8 categories'Yes — system + few-shotWithout examples, category boundaries are ambiguous
'Extract 12 fields from legal contracts'Yes — system + XML delimiters + output schemaWithout structure, field names vary and fields get hallucinated
Multi-turn customer support agentYes — full production templatePersona, guardrails, escalation rules, and format must be locked
The consistency test

Run your prompt on 10 different inputs and look for variance in the output format, not just content. If format varies — response in JSON sometimes, prose sometimes — you need more structure. If format is consistent but content quality varies, more examples or constraints are the fix.

Designing System Prompts

The system prompt is your primary behavioral lever. It defines who the model is, what constraints it operates under, and what its output must look like. A poorly structured system prompt makes every downstream call unpredictable.

Production system prompt — Anthropic SDK
Same pattern — OpenAI SDK (gpt-5.4-mini, developer role)
  • Lead with role definition: who is the model and what is its expertise?
  • Specify constraints explicitly: what must the model NOT do?
  • Define output format: structure, length, style — don't assume the model will infer it
  • Include guardrails: safety boundaries and redirect behavior for off-topic inputs
  • Use markdown headers (##) to organise long system prompts — models respect structure in their own instructions

Structuring User Messages

The user message contains the actual task. Its most common failure mode is mixing instructions with data — the model can't tell where the instructions end and the input begins, so it applies instructions to the wrong thing or ignores them.

Data-instruction separation with XML delimiters
Use XML-style delimiters

Wrapping sections in <document>, <instructions>, <examples> tags helps the model distinguish instructions from input data. This is critical when user-provided data might contain text that looks like instructions — a basic prompt injection defence. Claude has particularly strong adherence to XML-delimited sections.

Few-Shot Examples That Work

Few-shot examples in assistant messages teach the output format by demonstration. This is more reliable than format instructions alone because you're showing the exact JSON structure, edge-case handling, and reasoning style you want — not describing it.

Few-shot classification — OpenAI SDK
Same few-shot pattern — Anthropic SDK
  • 3-5 examples cover most classification and extraction tasks — more gives diminishing returns
  • Include at least one example per output category and one edge case
  • Example quality matters more than quantity — use your best, cleanest outputs
  • For multi-turn conversation history, summarise older messages rather than including the full transcript

Prompt Structure Meets Prompt Caching

Prompt caching changes the cost calculus for system prompts. With caching, long stable prefixes — system prompt plus few-shot examples — are stored after the first call. Subsequent calls pay ~0.1× the normal input price for the cached portion. This means a longer, more detailed system prompt can be cheaper in production than a short one, once you amortise the write cost.

PROMPT LAYERSTATUSCOST/CALL1System PromptRole · constraints · output formatCACHED0.1× on cache hit2Few-Shot ExamplesStatic input/output demonstrationsCACHED0.1× on cache hit3Retrieved ContextRAG chunks · tool resultsVARIES1× (changes per call)4User MessageThe actual task or questionDYNAMIC1× always── CACHE BOUNDARY ──Example: 1,000-token system prompt × 1,000 calls at Sonnet 4.6 pricingNo cache: $3.00 · With cache: ~$0.30 (90% savings)

Static layers cache at 0.1× cost · Dynamic layers always pay full price

Enabling prompt caching on Claude — mark the cache boundary
Cache hit requires an identical prefix

The cache key is the exact token sequence up to the cache_control boundary. Any change — even whitespace — busts the cache. Keep your system prompt and few-shot examples in a versioned constant; don't assemble them dynamically per-call. Dynamic context (RAG chunks, user profile) should always come after the cache boundary.

How Prompts Fail in Production

The most expensive prompt bugs aren't syntax errors — they're semantic failures that look correct in testing but degrade quietly in production. These are the patterns that account for the majority of prompt-related production incidents.

FailureSymptomRoot causeFix
Data-instruction confusionModel treats input text as a commandNo delimiter between instructions and dataWrap data in <document> or similar XML tags
Contradictory constraintsModel behaves inconsistently across callsSystem prompt says 'be concise' and 'be thorough' simultaneouslyAudit for contradictions; pick one or specify when each applies
Role priority violationUser can override system constraints via prompt injectionCritical guardrails placed in user message, not system promptGuardrails belong in system/developer role only
Silent format driftOutput format gradually shifts across conversation turnsFormat instructions only in system prompt, not reinforcedAdd brief format reminder to complex user messages
Context boundary leakModel cites information from previous conversationsAssistant messages from different sessions injected as contextScope conversation history to the current session only
Real project

A customer support bot was returning billing information to the wrong users. Root cause: the system prompt said 'only share account info for the authenticated user', but the user message template included a <context> block with the previous conversation from a different session — injected by a bug in session management. The model followed its instructions correctly; the context it was given was wrong. Prompt structure created a false sense of security that masked the data isolation bug for three days.

System prompt is not a security boundary

System prompt instructions reduce prompt injection risk but do not eliminate it. A sufficiently crafted user input can cause the model to ignore system instructions in some models under some conditions. Never use the system prompt as a substitute for access control, authentication, or data isolation in your application layer.

Production Prompt Template

A prompt builder that handles model tiering, caching, and the correct role for each provider. Copy and adapt this as a starting point.

Runnable production prompt builder — Anthropic + OpenAI

Best Practices

Best Practices

Do

  • Put all behavioral constraints in the system/developer prompt — it has the highest trust level
  • Use XML-style tags (<document>, <instructions>, <examples>) to separate data from instructions
  • Include 3-5 few-shot examples for classification and extraction tasks — cover edge cases
  • Structure system prompts with clear sections: role, constraints, output format, guardrails
  • Enable prompt caching on stable system prompts — you'll cut repeated input costs by ~90%
  • Keep the cache prefix (system prompt + static examples) identical across calls — any change busts the cache
  • Test prompts on at least 10 diverse inputs before shipping — check for format variance, not just accuracy
  • Use the developer role (not system) on OpenAI o1+ models — system messages are silently deprioritised
  • Specify the exact output schema in the prompt — don't describe the format, show it
  • Put dynamic context (RAG chunks, user data) after the cache boundary so the static prefix stays cacheable

Don’t

  • Don't put security guardrails in user messages — they can be overridden by adversarial input
  • Don't write contradictory constraints (e.g., 'be concise' and 'be thorough') — pick one or scope each
  • Don't rely on prefilling to steer output format on Claude 4.6+ — it returns a 400 error
  • Don't mix instructions and input data without delimiters — the model can't tell where one ends
  • Don't assemble the system prompt dynamically per-call if you rely on caching — dynamic construction busts the cache
  • Don't treat system prompt instructions as a security boundary — they reduce injection risk, not eliminate it
  • Don't use more than 10 few-shot examples without testing whether fine-tuning is cheaper
  • Don't add format instructions only once and assume they'll hold across a long multi-turn conversation
  • Don't include irrelevant conversation history — scope context to the current session only
  • Don't write unstructured wall-of-text system prompts — use headers, bullets, and labelled sections

Key Takeaways

  • System and developer prompts are architecturally privileged — all behavioral constraints belong there, not in user messages.
  • OpenAI o1+ models use a 'developer' role instead of 'system' — mixing them up causes instructions to be silently deprioritised.
  • Prefilling is deprecated on Claude 4.6+ and returns a 400 error — use structured outputs or system prompt format instructions instead.
  • XML delimiters (<document>, <instructions>) are the primary defence against data-instruction confusion and basic prompt injection.
  • Prompt caching makes longer stable system prompts cheaper in production — 1,000 tokens × 1,000 calls drops from $3.00 to ~$0.30 at Sonnet 4.6 pricing.
  • Few-shot examples teach format by demonstration — 3-5 examples covering edge cases outperform verbose format descriptions.

Video on this topic

The anatomy of a perfect AI prompt

tiktok