Authentication & Multi-Tenancy
A production decision guide for multi-tenant agent systems: when to build isolation, which strategy fits your scale, how the request lifecycle works, and where it silently fails.
Quick Reference
- →Skip multi-tenancy for internal tools, stateless agents, or prototypes with < 5 users — add it when 2+ distinct organizations share one deployment
- →Use LangGraph Platform's @auth.authenticate + @auth.on for cloud/managed deployments; custom JWT middleware for self-hosted FastAPI
- →Namespace every thread ID, checkpoint, and memory key as {tenant_id}::{resource_id} — application-layer isolation before database-layer RLS
- →Use set_config('app.current_tenant', %s, true) with parameterized values — never f-string tenant IDs into SQL
- →Reset app.current_tenant on every connection pool checkout — a stale context from the previous request is a data breach
- →Bind only the tools a user is allowed to call with model.bind_tools(filtered_list) — never rely on the LLM declining a tool it can see
- →Log every tool invocation with user_id, tenant_id, and thread_id — required for SOC 2, HIPAA, and GDPR incident response
- →Test cross-tenant isolation in CI: create two tenants, verify neither can read the other's state
When NOT to Build Multi-Tenancy
Multi-tenancy means shared infrastructure with isolated data: one deployment serving multiple organizations, with guarantees that Tenant A can never read Tenant B's state. It solves a real problem, but it adds meaningful complexity — namespaced state, connection pool hygiene, RLS policies, per-role tool binding, and an audit trail. That complexity pays off when you need it. It costs you when you don't.
Build it when 2+ distinct organizations (or 50+ users who must never see each other's data) share one deployment. Before that threshold, a single-tenant deployment with one auth check is simpler, faster to ship, and easier to debug when something breaks.
- ▸Internal tools with a single company as the only customer: one auth check + one namespace is enough
- ▸Stateless agents with no checkpoints, no memory, and no stored tool results: there is no state to isolate
- ▸Prototypes and closed betas with fewer than 5 users: requirements will change and you will rebuild auth anyway
- ▸Batch-processing agents that run offline with no interactive sessions: no concurrent state means no isolation risk
- ▸Single-LLM-call microservices that return results and store nothing: row-level security adds zero value