Why LLMs Hallucinate
LLMs hallucinate because they are statistical pattern matchers, not knowledge databases. Understand the types of hallucination, when they are most likely, practical mitigation strategies, and why designing around hallucination is more realistic than eliminating it.
Quick Reference
- →Hallucination is not a bug -- it is inherent to how next-token prediction works
- →Three types: factual (wrong facts), faithful (contradicts provided context), instruction (ignores instructions)
- →Most likely with: rare topics, specific numbers/dates, recent events, confident-sounding assertions
- →Mitigation: grounding with retrieval, self-consistency checks, asking for citations, confidence calibration
- →You cannot fully eliminate hallucination -- design your system to detect and handle it
In this article
The Statistical Root Cause
LLMs do not store facts in a database and look them up. They learn statistical patterns from training data -- which tokens are likely to follow which other tokens in which contexts. When the model generates 'Paris is the capital of France,' it is not retrieving a fact. It is producing the statistically most likely continuation given the pattern. This distinction matters because it explains why hallucination is not a fixable bug but a fundamental property of the architecture.
- ▸The model learns P(next_token | previous_tokens) -- the probability of each token given context
- ▸For well-represented facts (frequently in training data), the most likely tokens happen to be correct
- ▸For rare or absent facts, the model generates plausible-sounding but potentially wrong continuations
- ▸The model has no internal mechanism for distinguishing 'I know this' from 'this sounds right'
- ▸Confidence in output (assertive phrasing) does not correlate with correctness -- models are confidently wrong
Even when a model has 'learned' a fact during training, the information is distributed across billions of parameters. Retrieval requires the right attention pattern to activate the right parameters. Sometimes the activation path leads to a nearby but incorrect pattern instead -- like how humans sometimes confidently misremember facts.
Types of Hallucination
| Type | Description | Example | Severity |
|---|---|---|---|
| Factual | States something factually incorrect | "Python was created by Guido van Rossum in 1995" (it was 1991) | High -- user may trust and propagate |
| Faithful | Contradicts information provided in context | Given a document saying revenue was $5M, model says $8M | Critical -- defeats the purpose of RAG |
| Instruction | Ignores or misinterprets explicit instructions | Asked for JSON, outputs markdown instead | Medium -- usually caught by validation |
| Attribution | Fabricates sources, citations, or URLs | "According to Smith et al. (2023) in Nature..." (paper doesn't exist) | High -- creates false authority |
| Reasoning | Reaches wrong conclusion despite correct premises | Correct math steps but wrong final answer | High -- hard to detect without verification |
When asked for sources, LLMs will confidently generate realistic-looking but completely fake academic paper titles, author names, journal names, and DOIs. Never trust an LLM-generated citation without verification. This is one of the most dangerous hallucination types because it creates false authority that users may not question.
When Hallucinations Are Most Likely
Hallucinations are not random -- they follow predictable patterns. Understanding when the model is most likely to hallucinate helps you build guardrails in the right places.
- ▸Rare or niche topics: the model has less training data to draw from, so pattern completion is less reliable
- ▸Specific numbers and dates: precise quantitative facts are poorly stored in neural network weights
- ▸Recent events: anything after the training data cutoff is unknown to the model, but it will still generate plausible answers
- ▸Long, complex reasoning chains: errors compound across multiple steps
- ▸When the model is forced to answer: if there is no 'I don't know' option, the model will fabricate
- ▸Ambiguous prompts: when the task is unclear, the model fills in gaps with plausible but potentially wrong assumptions
- ▸Low-resource languages: hallucination rates are significantly higher in languages with less training data
Mitigation Strategies
| Strategy | How it works | Effectiveness | Cost |
|---|---|---|---|
| Grounding (RAG) | Provide source documents for the model to reference | High for factual tasks | Moderate (retrieval pipeline) |
| Self-consistency | Generate N answers, take majority vote | Moderate (catches random errors) | Nx inference cost |
| Chain-of-thought verification | Ask model to verify its own reasoning step by step | Low-moderate (models can validate wrong logic) | 2x inference cost |
| Citation requirement | Force model to quote source text for each claim | High for faithful hallucination | Slight increase in output tokens |
| Confidence calibration | Ask model to rate its confidence per claim | Low (models are poorly calibrated) | Minimal |
| External verification | Check facts against a database or API | Very high (ground truth) | High (requires verification infrastructure) |
No single mitigation eliminates hallucination. Use defense in depth: (1) Ground with retrieved context, (2) Require citations from the context, (3) Validate structured output with Pydantic, (4) Run post-hoc checks for critical claims. The depth of your defense should match the cost of a hallucination in your use case.
Designing Systems That Handle Hallucination
The most important insight about hallucination is accepting that you cannot eliminate it. Instead, design your system to detect, contain, and recover from hallucination. The approach depends entirely on the stakes involved.
| Stakes level | Example use case | Appropriate design |
|---|---|---|
| Low | Content brainstorming, creative writing | Accept hallucination, it is a feature (creativity) |
| Medium | Customer support, code suggestions | Flag uncertain answers, offer to escalate to human |
| High | Medical information, legal advice | Require source citations, human review for all outputs |
| Critical | Financial transactions, safety systems | LLM proposes, deterministic system verifies and executes |
- ▸Always separate LLM reasoning from action execution -- never let an LLM directly execute irreversible actions
- ▸For high-stakes domains, use LLMs as classifiers/routers rather than generators (classify into known-good options)
- ▸Build feedback loops: when users correct hallucinations, log them and use for evaluation/fine-tuning
- ▸Monitor hallucination rates in production: track user corrections, confidence scores, citation accuracy
- ▸Consider 'I don't know' as a feature, not a failure -- a system that says 'I don't know' when appropriate is more trustworthy
LLMs hallucinate most when generating free-form text. They hallucinate least when choosing from a fixed set of options. Whenever possible, frame your task as classification (pick from these 5 options) rather than generation (write the answer). This dramatically reduces hallucination risk.
Best Practices
Do
- ✓Ground model responses with retrieved context (RAG) for any factual task
- ✓Give the model explicit permission to say 'I don't know' or express uncertainty
- ✓Require source citations from provided context and verify they exist
- ✓Match your hallucination defense depth to the stakes of your use case
- ✓Frame tasks as classification (choose from options) rather than generation when possible
Don’t
- ✗Don't trust LLM-generated citations, URLs, or references without external verification
- ✗Don't force the model to always provide an answer -- allow uncertainty
- ✗Don't assume that asking 'Are you sure?' catches hallucination -- the model will just say 'Yes'
- ✗Don't rely on confidence scores or model self-assessment -- they are poorly calibrated
- ✗Don't use LLMs for precise numerical calculations, date lookups, or other tasks requiring exact recall
Key Takeaways
- ✓Hallucination is inherent to next-token prediction -- LLMs generate plausible continuations, not verified facts.
- ✓Five hallucination types: factual, faithful, instruction, attribution, and reasoning -- each requires different mitigation.
- ✓Hallucination risk increases with topic rarity, numerical precision, recency, and forced answering.
- ✓Layer defenses: grounding + citations + validation + external verification, proportional to stakes.
- ✓Design around hallucination rather than trying to eliminate it -- accept, detect, contain, and recover.
Video on this topic
Why ChatGPT makes things up (and always will)
tiktok