Integrations/Observability
Intermediate9 min

Logging, Metrics & Alerting

Building production dashboards for AI agents: structured logging, custom metrics (latency, cost, completion rate), and alerting on anomalies.

Quick Reference

  • Structured logging: emit JSON logs with correlation_id, user_id, agent_version, node_name, and duration for every graph step
  • Key agent metrics: p50/p95 latency per node, tokens per conversation, cost per conversation, task completion rate, tool error rate
  • Use metric cardinality wisely: label by agent_version and node_name, not by user_id or conversation_id (too many labels)
  • Alert on rate-of-change: a sudden spike in tool errors or token usage per request often signals a prompt or model regression
  • Build a single-pane dashboard showing agent health: request rate, error rate, latency distribution, and cost trend

Structured Logging for Agents

JSON logs, not print statements

Emit structured JSON logs with correlation_id, user_id, node_name, and duration at every graph step. Structured logs are queryable in log aggregation tools (CloudWatch, Datadog Logs, Loki). Unstructured logs are not.

Structured logging with structlog at every graph node boundary