Integrations/Observability
Intermediate18 min

LangSmith: Production Observability for Agents

LangSmith gives you full observability into every LLM call, tool invocation, and state transition in your agent — automatically, with no code changes. This article covers how to use it, how much it costs, how to protect PII, and how to turn production traces into evaluation datasets.

Quick Reference

  • Use LANGSMITH_TRACING=true and LANGSMITH_API_KEY (not the older LANGCHAIN_* names, which are deprecated)
  • Every LLM call, tool invocation, and LangGraph node is captured as a hierarchical span with tokens, latency, and exact I/O
  • Tag runs with metadata (user_id, agent_version, feature_flag) to filter and aggregate traces in dashboards
  • LangSmith free tier: 5K traces/month; Plus: $39/seat + $2.50–$5.00 per 1K traces — budget before you ship to prod
  • Scrub PII from traces using hide_inputs/hide_outputs on the @traceable decorator or via RunnableConfig
  • Production traces are your best source of eval data — annotate, export to datasets, and gate deploys on eval scores
  • LangSmith now supports OpenTelemetry export (pip install langsmith[otel]) — vendor-neutral escape hatch if you need it

Should You Use LangSmith?

Pick a tracing toolLangChain orLangGraph stack?YESLangSmith ✓tightest LangGraph integrationNOSelf-hosting ordata residency required?YESArize Phoenix (OSS)or Langfuse self-hostedNOLangfuse Cloudframework-agnosticAll three support OpenTelemetryswitch tools without re-instrumenting

LangSmith is the default for LangChain/LangGraph — for other stacks, start with Langfuse

LangSmith is the right default if you are using LangChain or LangGraph. Its auto-instrumentation is the tightest available for those frameworks — every node, edge, and state transition in a LangGraph is automatically captured as a span. The UI is built around the mental model of a trace tree, which maps directly to how LangGraph execution actually works.

When to use something else

If your agent does not use LangChain or LangGraph — say you are calling the Anthropic API directly or using Pydantic AI — Langfuse is a stronger default. It is open source (MIT), framework-agnostic, self-hostable, and at 1M traces/month costs roughly a third of LangSmith Cloud. Arize Phoenix is the self-hosted alternative if data residency is a hard requirement.

If you need vendor-neutral tracing that works across tools, both LangSmith and Langfuse now support OpenTelemetry. Enable LangSmith's OTEL exporter with pip install langsmith[otel] and configure your OTEL endpoint. You can migrate between backends without re-instrumenting your code.