Intermediate9 min
Bias Detection & Mitigation
Detect, measure, and mitigate bias in LLM outputs — from demographic disparities in classification to stereotyped language in generation. Practical techniques for production agents.
Quick Reference
- →Bias types: demographic (gender, race, age), cultural (language, region), selection (training data skew)
- →Detection: run the same prompt with swapped demographics — compare outputs for disparities
- →Measurement: disparity ratios, toxicity scores, sentiment analysis across demographic groups
- →Mitigation: system prompt guidelines, output filtering middleware, diverse evaluation datasets
- →Monitoring: track bias metrics in production via online evaluation — catch drift over time
- →Red teaming: adversarial testing specifically for bias-triggering prompts
Types of Bias in LLM Agents
| Bias Type | How It Manifests | Example |
|---|---|---|
| Demographic | Different quality/tone for different groups | More formal responses for male names, more casual for female |
| Cultural | Assumes Western context by default | Recommending US-specific solutions for global users |
| Selection | Training data overrepresents some groups | Better code completion for Python than Tamil-script languages |
| Confirmation | Reinforces user's stated beliefs | Agreeing with incorrect medical claims instead of correcting |
| Sycophancy | Overly agreeable, avoids disagreement | Rating all user code as 'great' instead of pointing out bugs |