Advanced13 min

RemoteGraph

When and how to run LangGraph agents as remote services. Decision framework for when not to use RemoteGraph, direct subgraph embedding, production-shaped supervisor with error handling, thread-based state persistence, and the failure modes that reliably bite in production.

Quick Reference

  • Initialize with positional name: RemoteGraph('agent', url=..., api_key=...) — the graph_id keyword does not exist
  • Embed directly as a subgraph node: builder.add_node('child', remote_graph) — no manual wrapper function needed
  • NEVER call a RemoteGraph that targets the same deployment — deadlocks and resource exhaustion
  • Thread persistence: pass {'configurable': {'thread_id': '...'}} to maintain conversation state across remote calls
  • Enable distributed tracing: RemoteGraph('agent', url=..., distributed_tracing=True) for end-to-end LangSmith traces
  • Stream modes over HTTP: 'messages' (token-by-token), 'updates' (after each node), 'values' (full state snapshot)
  • LangGraph Platform is now called LangSmith Deployment — same infrastructure, naming changed Oct 2025

When to Use RemoteGraph (and When Not To)

RemoteGraph adds a network boundary to your agent architecture. That boundary has real costs: per-call HTTP latency, serialization overhead, network failure modes, and debugging complexity. The question is never 'can I use RemoteGraph?' — you always can. The question is 'does the benefit justify the operational cost?'

FactorUse RemoteGraphKeep Local
Team ownershipDifferent teams own different agents, need independent deploysSingle team owns the entire graph
Scaling needsSub-agents need different compute (GPU vs CPU, memory-optimized)All nodes run on the same instance
Latency budgetUser-facing latency > 500ms is acceptable; streaming mitigates perceptionNeed sub-100ms p50; HTTP overhead is unacceptable
Fault isolationA failing worker should not crash the supervisorSingle-process failure modes are acceptable
Deployment cadenceSub-agents deploy on different release schedulesEverything deploys together
The deadlock you will not see coming

Do NOT use RemoteGraph to call itself or another graph on the same deployment. This causes deadlocks and resource exhaustion. Each incoming request consumes a worker slot; a call back to the same deployment waits for a worker that is already occupied. In a pool of 4 workers, 4 concurrent requests can deadlock the entire deployment. This is the single most common production incident with RemoteGraph.

Should you use RemoteGraph?four checks — stop at the first YESNOSame deployment as caller?YESDO NOT USEdeadlock + resource exhaustionNONeed sub-100ms p50 latency?YESKeep LocalHTTP adds per-call overheadNOOne team owns all agents?YESUse Local Subgraphsimpler, no network boundaryNOUse RemoteGraph ✓separate infra, independent scaling, team isolation

Stop at the first YES — only reach "Use RemoteGraph" if all three checks pass