Distributed Tracing
Cross-service trace correlation — propagate trace context from your API gateway through microservices to LLM calls, and visualize the full request lifecycle in one trace.
Quick Reference
- →Distributed tracing connects agent spans with backend service spans into a single end-to-end trace
- →Use RunTree to manually propagate LangSmith trace context across service boundaries
- →OpenTelemetry exporters can send LangSmith spans alongside your existing infrastructure traces
- →Multi-agent systems need explicit parent-child linking to correlate traces across agent boundaries
- →Sampling strategies balance observability cost against trace completeness in high-traffic systems
Why Distributed Tracing for Agents
Production agents rarely live in isolation. A typical request flows through an API gateway, hits an orchestration service, fans out to multiple tool-calling agents, queries vector databases, and returns through the same chain. Without distributed tracing, you see each service's logs in isolation — you cannot answer 'why was this request slow?' because the bottleneck might be three services deep.
LangSmith traces your LLM calls. Your APM traces your HTTP services. But neither shows the full picture. Distributed tracing bridges the gap by linking LangSmith spans with backend spans into one correlated trace.
- ▸API gateway latency is invisible to LangSmith — distributed tracing makes it visible
- ▸Tool calls that hit external APIs create gaps in LangSmith traces — backend spans fill those gaps
- ▸Multi-agent orchestration without trace correlation makes debugging cross-agent failures impossible
- ▸Cost attribution requires end-to-end traces to allocate LLM spend to specific user requests