LLM Foundations/How LLMs Work
Intermediate9 min

Why LLMs Hallucinate

LLMs hallucinate because they are statistical pattern matchers, not knowledge databases. Understand the types of hallucination, when they are most likely, practical mitigation strategies, and why designing around hallucination is more realistic than eliminating it.

Quick Reference

  • Hallucination is not a bug -- it is inherent to how next-token prediction works
  • Three types: factual (wrong facts), faithful (contradicts provided context), instruction (ignores instructions)
  • Most likely with: rare topics, specific numbers/dates, recent events, confident-sounding assertions
  • Mitigation: grounding with retrieval, self-consistency checks, asking for citations, confidence calibration
  • You cannot fully eliminate hallucination -- design your system to detect and handle it

The Statistical Root Cause

LLMs do not store facts in a database and look them up. They learn statistical patterns from training data -- which tokens are likely to follow which other tokens in which contexts. When the model generates 'Paris is the capital of France,' it is not retrieving a fact. It is producing the statistically most likely continuation given the pattern. This distinction matters because it explains why hallucination is not a fixable bug but a fundamental property of the architecture.

  • The model learns P(next_token | previous_tokens) -- the probability of each token given context
  • For well-represented facts (frequently in training data), the most likely tokens happen to be correct
  • For rare or absent facts, the model generates plausible-sounding but potentially wrong continuations
  • The model has no internal mechanism for distinguishing 'I know this' from 'this sounds right'
  • Confidence in output (assertive phrasing) does not correlate with correctness -- models are confidently wrong
The softmax bottleneck

Even when a model has 'learned' a fact during training, the information is distributed across billions of parameters. Retrieval requires the right attention pattern to activate the right parameters. Sometimes the activation path leads to a nearby but incorrect pattern instead -- like how humans sometimes confidently misremember facts.

Types of Hallucination

TypeDescriptionExampleSeverity
FactualStates something factually incorrect"Python was created by Guido van Rossum in 1995" (it was 1991)High -- user may trust and propagate
FaithfulContradicts information provided in contextGiven a document saying revenue was $5M, model says $8MCritical -- defeats the purpose of RAG
InstructionIgnores or misinterprets explicit instructionsAsked for JSON, outputs markdown insteadMedium -- usually caught by validation
AttributionFabricates sources, citations, or URLs"According to Smith et al. (2023) in Nature..." (paper doesn't exist)High -- creates false authority
ReasoningReaches wrong conclusion despite correct premisesCorrect math steps but wrong final answerHigh -- hard to detect without verification
Fabricated citations are extremely common

When asked for sources, LLMs will confidently generate realistic-looking but completely fake academic paper titles, author names, journal names, and DOIs. Never trust an LLM-generated citation without verification. This is one of the most dangerous hallucination types because it creates false authority that users may not question.

When Hallucinations Are Most Likely

Hallucinations are not random -- they follow predictable patterns. Understanding when the model is most likely to hallucinate helps you build guardrails in the right places.

  • Rare or niche topics: the model has less training data to draw from, so pattern completion is less reliable
  • Specific numbers and dates: precise quantitative facts are poorly stored in neural network weights
  • Recent events: anything after the training data cutoff is unknown to the model, but it will still generate plausible answers
  • Long, complex reasoning chains: errors compound across multiple steps
  • When the model is forced to answer: if there is no 'I don't know' option, the model will fabricate
  • Ambiguous prompts: when the task is unclear, the model fills in gaps with plausible but potentially wrong assumptions
  • Low-resource languages: hallucination rates are significantly higher in languages with less training data
Prompt that encourages vs discourages hallucination

Mitigation Strategies

StrategyHow it worksEffectivenessCost
Grounding (RAG)Provide source documents for the model to referenceHigh for factual tasksModerate (retrieval pipeline)
Self-consistencyGenerate N answers, take majority voteModerate (catches random errors)Nx inference cost
Chain-of-thought verificationAsk model to verify its own reasoning step by stepLow-moderate (models can validate wrong logic)2x inference cost
Citation requirementForce model to quote source text for each claimHigh for faithful hallucinationSlight increase in output tokens
Confidence calibrationAsk model to rate its confidence per claimLow (models are poorly calibrated)Minimal
External verificationCheck facts against a database or APIVery high (ground truth)High (requires verification infrastructure)
Self-consistency check: generate multiple answers and compare
Layer your defenses

No single mitigation eliminates hallucination. Use defense in depth: (1) Ground with retrieved context, (2) Require citations from the context, (3) Validate structured output with Pydantic, (4) Run post-hoc checks for critical claims. The depth of your defense should match the cost of a hallucination in your use case.

Designing Systems That Handle Hallucination

The most important insight about hallucination is accepting that you cannot eliminate it. Instead, design your system to detect, contain, and recover from hallucination. The approach depends entirely on the stakes involved.

Stakes levelExample use caseAppropriate design
LowContent brainstorming, creative writingAccept hallucination, it is a feature (creativity)
MediumCustomer support, code suggestionsFlag uncertain answers, offer to escalate to human
HighMedical information, legal adviceRequire source citations, human review for all outputs
CriticalFinancial transactions, safety systemsLLM proposes, deterministic system verifies and executes
  • Always separate LLM reasoning from action execution -- never let an LLM directly execute irreversible actions
  • For high-stakes domains, use LLMs as classifiers/routers rather than generators (classify into known-good options)
  • Build feedback loops: when users correct hallucinations, log them and use for evaluation/fine-tuning
  • Monitor hallucination rates in production: track user corrections, confidence scores, citation accuracy
  • Consider 'I don't know' as a feature, not a failure -- a system that says 'I don't know' when appropriate is more trustworthy
The classification trick

LLMs hallucinate most when generating free-form text. They hallucinate least when choosing from a fixed set of options. Whenever possible, frame your task as classification (pick from these 5 options) rather than generation (write the answer). This dramatically reduces hallucination risk.

Best Practices

Best Practices

Do

  • Ground model responses with retrieved context (RAG) for any factual task
  • Give the model explicit permission to say 'I don't know' or express uncertainty
  • Require source citations from provided context and verify they exist
  • Match your hallucination defense depth to the stakes of your use case
  • Frame tasks as classification (choose from options) rather than generation when possible

Don’t

  • Don't trust LLM-generated citations, URLs, or references without external verification
  • Don't force the model to always provide an answer -- allow uncertainty
  • Don't assume that asking 'Are you sure?' catches hallucination -- the model will just say 'Yes'
  • Don't rely on confidence scores or model self-assessment -- they are poorly calibrated
  • Don't use LLMs for precise numerical calculations, date lookups, or other tasks requiring exact recall

Key Takeaways

  • Hallucination is inherent to next-token prediction -- LLMs generate plausible continuations, not verified facts.
  • Five hallucination types: factual, faithful, instruction, attribution, and reasoning -- each requires different mitigation.
  • Hallucination risk increases with topic rarity, numerical precision, recency, and forced answering.
  • Layer defenses: grounding + citations + validation + external verification, proportional to stakes.
  • Design around hallucination rather than trying to eliminate it -- accept, detect, contain, and recover.

Video on this topic

Why ChatGPT makes things up (and always will)

tiktok