Integrations/Observability
Intermediate13 min

Logging, Metrics & Alerting

Learn when to build custom observability versus use a managed platform, then build it right: structured logging with correlation IDs, Prometheus metrics with cardinality discipline, and rate-of-change alerts that catch regressions before your users do.

Quick Reference

  • Build custom observability (Prometheus + structlog) when you have existing Grafana infrastructure or need custom business metrics; use LangSmith or Langfuse for LLM-native observability with near-zero config
  • Structured logging: emit JSON logs with correlation_id, node_name, agent_version, and duration_ms at every graph boundary — never log full prompt text in production
  • The six metrics that matter: node p50/p95 latency, tokens per conversation, cost per conversation, task completion rate, tool error rate, retry count
  • Cardinality rule: only label Prometheus metrics by values with fewer than 100 possible options — node_name and agent_version are safe; user_id and conversation_id will crash Prometheus at scale
  • Alert on rate-of-change: a spike from 1% to 5% error rate in 5 minutes is an incident; a steady 3% error rate is normal non-determinism — alerting on the level trains your team to ignore pages
  • Cost-per-conversation is the metric that catches prompt regressions invisible to error rate and latency: an extra reasoning loop doubles cost before it degrades user-visible quality
  • Write a runbook for every alert before it fires in production — an alert without next steps is noise that erodes on-call trust

Should You Build Custom Observability?

Start with a managed platform

LangSmith gives you full traces with two environment variables and zero code changes. Langfuse is open-source and self-hostable. Braintrust adds eval-first observability on top. These platforms are LLM-native — they understand token counts, prompt inspection, and run comparison out of the box. Build custom Prometheus instrumentation only when you have a specific reason.

Agent GraphLangGraph nodesstructlogJSON eventsPrometheusHistograms + CountersOTel Spansgen_ai.* attributesLog AggregatorCloudWatch / LokiGrafanaDashboards + AlertsTrace BackendJaeger / TempoProduction DashboardGrafana · CloudWatch · Datadog

Logs answer "what happened," metrics answer "how much," traces answer "why"

The three questions that determine your stack: (1) Do you already have Prometheus and Grafana in production for your other services? If yes, building custom metrics means agents appear in the same dashboards as your APIs and databases — no new vendor, no new context-switch. (2) Do you need custom business metrics tied to agent behavior — revenue per conversation, lead quality score, or document processing cost? Managed platforms don't expose hooks for arbitrary business logic. (3) Do you need data sovereignty — all telemetry on your own infrastructure with no data leaving your VPC? If none of these apply, start with LangSmith and come back to this article when you outgrow it.