AI Engineering Judgment/Compliance & Responsible AI
Advanced10 min

Audit Trails & Explainability

Regulated industries require audit trails for every AI decision. Learn what to log (inputs, outputs, model version, latency, cost, tool calls), how to structure traces for querying, how to provide explainability to end users, and retention policies that balance compliance with cost.

Quick Reference

  • Log every LLM interaction: input, output, model version, latency, token count, cost, trace ID
  • Use structured logging (JSON) so traces are queryable — not unstructured text logs
  • Trace correlation: link every LLM call to the user request, session, and user ID
  • Explainability: store the chain of reasoning so you can explain any AI decision after the fact
  • Retention policies vary by regulation: GDPR requires deletion on request, SOC 2 requires minimum 1 year retention
  • Separate audit logs from application logs — audit logs are immutable and have different access controls

What to Log for AI Systems

AI audit trails capture more than traditional application logs. Beyond the request/response, you need to log the model version (which can change without notice), the full prompt (including system prompt and context), tool calls and their results, and the reasoning chain. This data serves three purposes: regulatory compliance, debugging production issues, and demonstrating fairness.

Data PointWhy It MattersRetention Consideration
Input (user message)Reproduce and investigate any interactionMay contain PII — subject to deletion requests
Full prompt (system + context + user)Understand what the model actually sawMay contain PII — redact before long-term storage
Output (model response)Verify what was shown to the userKeep for dispute resolution
Model name + versionIdentify if a model update caused a regressionSmall, keep indefinitely
System fingerprintDetect silent model updates from providersSmall, keep indefinitely
Latency (TTFT, total)SLA compliance, performance monitoringAggregate after 90 days
Token count (input + output)Cost attribution, budget trackingAggregate after 90 days
Tool calls + resultsTrace which tools were invoked and what they returnedMay contain sensitive data — filter
Trace ID + parent IDCorrelate all calls in a single user interactionEssential for debugging
User ID + session IDAttribute interactions to users and sessionsSubject to deletion requests
Confidence scoreAssess reliability of the AI's decisionKeep for fairness audits
Prompts Are Sensitive Data

Full prompts often contain user data, proprietary system instructions, and business logic. Treat prompt logs with the same security as database backups: encrypted at rest, access-controlled, and subject to data retention policies. Do not store full prompts in general application logs.