Intermediate9 min

Bias Detection & Mitigation

Detect, measure, and mitigate bias in LLM outputs — from demographic disparities in classification to stereotyped language in generation. Practical techniques for production agents.

Quick Reference

  • Bias types: demographic (gender, race, age), cultural (language, region), selection (training data skew)
  • Detection: run the same prompt with swapped demographics — compare outputs for disparities
  • Measurement: disparity ratios, toxicity scores, sentiment analysis across demographic groups
  • Mitigation: system prompt guidelines, output filtering middleware, diverse evaluation datasets
  • Monitoring: track bias metrics in production via online evaluation — catch drift over time
  • Red teaming: adversarial testing specifically for bias-triggering prompts

Types of Bias in LLM Agents

Bias TypeHow It ManifestsExample
DemographicDifferent quality/tone for different groupsMore formal responses for male names, more casual for female
CulturalAssumes Western context by defaultRecommending US-specific solutions for global users
SelectionTraining data overrepresents some groupsBetter code completion for Python than Tamil-script languages
ConfirmationReinforces user's stated beliefsAgreeing with incorrect medical claims instead of correcting
SycophancyOverly agreeable, avoids disagreementRating all user code as 'great' instead of pointing out bugs