RemoteGraph
When and how to run LangGraph agents as remote services. Decision framework for when not to use RemoteGraph, direct subgraph embedding, production-shaped supervisor with error handling, thread-based state persistence, and the failure modes that reliably bite in production.
Quick Reference
- →Initialize with positional name: RemoteGraph('agent', url=..., api_key=...) — the graph_id keyword does not exist
- →Embed directly as a subgraph node: builder.add_node('child', remote_graph) — no manual wrapper function needed
- →NEVER call a RemoteGraph that targets the same deployment — deadlocks and resource exhaustion
- →Thread persistence: pass {'configurable': {'thread_id': '...'}} to maintain conversation state across remote calls
- →Enable distributed tracing: RemoteGraph('agent', url=..., distributed_tracing=True) for end-to-end LangSmith traces
- →Stream modes over HTTP: 'messages' (token-by-token), 'updates' (after each node), 'values' (full state snapshot)
- →LangGraph Platform is now called LangSmith Deployment — same infrastructure, naming changed Oct 2025
When to Use RemoteGraph (and When Not To)
RemoteGraph adds a network boundary to your agent architecture. That boundary has real costs: per-call HTTP latency, serialization overhead, network failure modes, and debugging complexity. The question is never 'can I use RemoteGraph?' — you always can. The question is 'does the benefit justify the operational cost?'
| Factor | Use RemoteGraph | Keep Local |
|---|---|---|
| Team ownership | Different teams own different agents, need independent deploys | Single team owns the entire graph |
| Scaling needs | Sub-agents need different compute (GPU vs CPU, memory-optimized) | All nodes run on the same instance |
| Latency budget | User-facing latency > 500ms is acceptable; streaming mitigates perception | Need sub-100ms p50; HTTP overhead is unacceptable |
| Fault isolation | A failing worker should not crash the supervisor | Single-process failure modes are acceptable |
| Deployment cadence | Sub-agents deploy on different release schedules | Everything deploys together |
Do NOT use RemoteGraph to call itself or another graph on the same deployment. This causes deadlocks and resource exhaustion. Each incoming request consumes a worker slot; a call back to the same deployment waits for a worker that is already occupied. In a pool of 4 workers, 4 concurrent requests can deadlock the entire deployment. This is the single most common production incident with RemoteGraph.
Stop at the first YES — only reach "Use RemoteGraph" if all three checks pass