All Topics

Production & Scale

Ship agents to production: Agent Server deployment, error handling, scaling, cost optimization, security, data engineering, and inference optimization.

0/35
Shipping to Production

The complete checklist for taking an agent from prototype to production: infrastructure, monitoring, rollout strategies, and incident response.

advanced12 min
Agent Server: Architecture & Deployment

Agent Server (formerly LangGraph Platform) is the production runtime for LangGraph agents — with Assistants, Threads, Runs, Cron jobs, and three deployment modes from single host to distributed.

advanced12 min
Error Handling & Retry

Production-grade error handling: retry strategies, fallback chains, dead letter queues, and graceful degradation patterns.

advanced10 min
Evaluation & Testing

How to evaluate agent quality: LangSmith datasets, LLM-as-judge scoring, regression testing, and CI/CD integration for agents.

advanced12 min
Cost Optimization & Caching

Reducing LLM costs by 60-90%: prompt caching, model tiering, semantic caching, and token budget management.

advanced10 min
Scaling to 1M Users

Architecture for high-scale agent systems: horizontal scaling, queue-based execution, state partitioning, and managing LLM rate limits across a fleet.

advanced14 min
Double Texting & Concurrency

Handling concurrent user messages: reject, rollback, interrupt, or enqueue strategies. Built-in LangGraph Platform support.

advanced9 min
Guardrails & Content Safety

NeMo Guardrails integration, input/output filtering, PII detection, topic rails, jailbreak prevention, and custom policy enforcement.

advanced10 min
Testing Agents in CI

Unit testing tools, integration testing full graphs, snapshot testing outputs, mocking LLM responses, and building CI pipelines for agent systems.

advanced11 min
Cron Jobs & Webhooks

Schedule recurring agent runs with cron jobs and receive real-time notifications with webhooks — essential infrastructure for production agent systems.

intermediate9 min
Sandbox Execution: Isolated Agent Environments

Run agents or their tools in isolated sandboxes — preventing unauthorized file access, network calls, and credential theft. Providers: Modal, Daytona, Deno, and LangSmith sandboxes.

advanced10 min
Migration & Graph Versioning

Updating agents in production without breaking active sessions: graph versioning, state migration strategies, and backward-compatible deploys.

advanced10 min
Debugging Production Agents

Debug production AI agents systematically: trace analysis through the full pipeline, log correlation from user complaint to specific LLM call, handling common production failures (timeouts, context overflow, tool errors), and structured post-mortems for AI incidents.

advanced11 min
Prompt Management

Treat prompts as versioned configuration: separate them from code, store in a registry with performance metadata, A/B test changes, and roll back instantly when a prompt change degrades quality.

advanced10 min
LangSmith Deployment (LangGraph Platform)

LangGraph Platform is now LangSmith Deployment — a managed hosting platform for long-running, stateful agents. Cloud, self-hosted, standalone server, and hybrid options.

intermediate12 min
RemoteGraph

Running graphs as remote services. Client-server model, SDK integration, authentication, and streaming over HTTP.

intermediate9 min
Custom Checkpointer & Store

Building your own persistence backends. BaseCheckpointSaver interface, custom Store implementations, and migration strategies.

advanced10 min
AWS Bedrock

Running LangChain agents on AWS Bedrock: setup, model access, IAM configuration, and production deployment with provisioned throughput.

intermediate11 min
Deployment Architectures

Choosing between serverless, containerized, and long-running deployment models for AI agents. Load balancing stateful agents, WebSocket vs SSE for streaming, and self-hosted infrastructure patterns.

advanced12 min
Rate Limiting & Quota Management

Managing LLM API rate limits across a fleet of agents: request queuing, token bucket algorithms, graceful degradation, and model fallback chains.

advanced10 min
Database & State Storage Patterns

Choosing and configuring storage backends for agent state: PostgreSQL for checkpoints, Redis for short-term state, and the tradeoffs between them.

advanced11 min
Caching Strategies at Scale

Reducing LLM API calls through caching: prompt caching, semantic caching, tool result caching, and cache invalidation patterns for agents.

advanced10 min
Authentication & Multi-Tenancy

Securing multi-tenant agent systems: user authentication, per-user tool permissions, session isolation, API key management, and tenant-scoped data access.

advanced11 min
Prompt Injection Defense

Defending agents against prompt injection attacks: input sanitization, instruction hierarchy, output validation, and monitoring for exploitation attempts.

advanced10 min
Input & Output Validation

Validating what goes into and comes out of your agent: schema validation, PII detection, content filtering, and ensuring agent outputs meet business rules.

advanced9 min
Agent Server Authentication & Authorization

Full auth system for Agent Server: @auth.authenticate for identity verification, @auth.on for resource-specific access control, and agent authentication for delegated MCP access.

advanced10 min
Knowledge Base Lifecycle

The full lifecycle of a production knowledge base: ingestion from diverse sources, transformation and chunking, indexing for retrieval, serving under load, incremental refresh strategies, and version management for reproducible agent behavior.

advanced12 min
Data Quality for AI Systems

Garbage in, garbage out is amplified with LLMs. Learn to build automated data quality pipelines that detect near-duplicates, track freshness, measure coverage gaps, and score completeness — so your agent never confidently serves stale or incorrect information.

advanced11 min
Feedback Pipelines

Build closed-loop feedback systems that capture user signals (thumbs up/down, corrections, regenerations), process them into actionable data, and drive measurable improvements to prompts, retrieval, and model selection.

advanced11 min
Model Management

Build a production model registry that tracks which models, prompts, and configs are in use, supports A/B deployment and shadow testing, and enables instant rollback when a new model underperforms.

advanced11 min
Self-Hosting LLMs

When and how to self-host LLMs for production: comparing vLLM, TGI, and Ollama, understanding hardware requirements, calculating the break-even point against API costs, and deploying a high-throughput serving stack.

advanced12 min
Quantization

Reduce LLM memory requirements by 2-4x with quantization: understand the tradeoffs between GPTQ, GGUF, and AWQ, measure quality impact at different precision levels, and choose the right approach for your hardware and latency requirements.

advanced11 min
Model Routing

Route queries to the right model based on complexity: send simple questions to cheap, fast models and complex reasoning tasks to expensive, capable models. Achieve 40-60% cost reduction with intelligent routing while maintaining quality on hard queries.

advanced11 min
Batching & Throughput

Maximize LLM serving throughput with continuous batching, dynamic request grouping, and request coalescing. Understand the prefill vs decode bottleneck and tune the throughput-latency tradeoff for your workload.

advanced11 min
GPU Cost Modeling

Understand the GPU landscape for LLM inference: compare A100, H100, L40S, and A10G on specs and pricing, calculate actual $/token for self-hosted models, model the break-even point against API providers, and optimize with spot instances.

advanced12 min