Integrations
LangSmith for observability, OpenTelemetry for tracing, MCP for infinite tools, voice and multimodal agents, and real-time streaming patterns.
LangSmith gives you full observability into every LLM call, tool invocation, and state transition in your agent — automatically, with no code changes. This article covers how to use it, how much it costs, how to protect PII, and how to turn production traces into evaluation datasets.
Event-driven triggers for production traces — filter by error, latency, metadata, or feedback scores, and route matches to annotation queues, datasets, webhooks, or online evaluators. Everything is configured in the LangSmith UI.
Build versioned eval datasets from production traces, write evaluators that actually measure correctness, run experiments to prove prompt changes work, and gate deploys on regression — the full LangSmith evaluation workflow.
Cross-service trace propagation for multi-service agents — decide if you need it, choose between LangSmith-native and OTel approaches, link traces across HTTP boundaries, and control costs with sampling.
The LangSmith MCP Server exposes your entire observability workspace as callable tools via Model Context Protocol — query traces with FQL, manage datasets, push prompts, and check billing from Claude Desktop, Cursor, or any custom agent without leaving your editor.
How to instrument LangGraph agents with OpenTelemetry: the Collector architecture you actually need in production, updated GenAI semantic conventions, cost math for sampling decisions, and the failure modes that will bite you before you notice.
Learn when to build custom observability versus use a managed platform, then build it right: structured logging with correlation IDs, Prometheus metrics with cardinality discipline, and rate-of-change alerts that catch regressions before your users do.
The first RAG decision is whether to use RAG at all — with 200K+ token context windows, it's a choice, not a given. This article covers the RAG-vs-long-context decision framework with cost math, building an indexing and retrieval pipeline, evaluation with concrete thresholds, production failure modes, monitoring, and a production-shaped LangGraph reference implementation.
When MCP earns its overhead over inline tools, how to connect local and remote servers in LangChain, how to build your own server with FastMCP, and the four failure modes that trip up production deployments.
MCP interceptors are async middleware for tool calls — wrapping every MCP invocation with auth, retry, logging, and access control. This article covers the real API (MCPToolCallRequest, handler, override), when to use interceptors vs alternatives, core patterns with correct imports, multi-server routing via server_name, failure modes when interceptors break, and testing strategies.
Resources, Prompts, and Elicitation are the three MCP primitives engineers most often skip. Here's what they're actually for, when to reach for each, and what breaks in production when you ignore them.
OAuth 2.1 + PKCE is the MCP spec requirement for HTTP servers — not a suggestion. Learn the discovery flow, per-user delegated auth via interceptors, what the spec forbids (token passthrough, audience-skipping), and when you need auth at all.
Build, version, deploy, and monitor production MCP servers with both the TypeScript SDK and FastMCP. Covers the build-vs-buy decision, schema versioning, deployment cost math, the gateway pattern for multi-server architectures, and three-tier health monitoring — because only 9% of remote MCP endpoints are fully healthy in the wild.
Voice agents cost 10-50x more per interaction than text agents and introduce failure modes that don't exist in chat. This article helps you decide whether voice is worth the complexity, choose the right architecture, understand the real costs, anticipate production failures, and evaluate whether your voice agent actually works.
When to use vision models vs. dedicated parsers, real cost math using Anthropic's actual token formula, how vision fails on financial docs, model tiering for 50–90% cost savings, image generation with gpt-image-1.5, and a 30-day deployment runbook.
Deep dive into production voice agent pipelines in 2026. Covers the pipeline-vs-Realtime-API architecture decision, updated STT and TTS provider choices (Deepgram Nova-3, ElevenLabs Flash v2.5, Cartesia Sonic 3), production-grade barge-in handling, cost modeling, and when to use LiveKit Agents Framework instead of rolling your own pipeline.
How to choose between Server-Sent Events and WebSocket for AI agent communication, with production-ready FastAPI code using the native EventSourceResponse API, authentication patterns, backpressure handling, and scaling strategies.
Multimodal pipelines add genuine value when layout, speaker identity, or visual content cannot be captured by text extraction alone — and add cost and hallucination risk when they can. This article covers the coordinator pattern, how to compute real costs, and how to defend against the specific failures that take these systems down in production.