Sandbox Execution: Isolated Agent Environments
Run agents or their tools in isolated sandboxes to prevent unauthorized file access, network calls, and credential theft. Decision framework for choosing between agent-in-sandbox and tool-in-sandbox patterns, provider comparison with correct April 2026 data (E2B, Modal, Daytona, Deno Sandbox, LangSmith), isolation technology differences (Firecracker vs gVisor vs OCI containers), sandbox lifecycle scoping, failure modes, and production implementation with verified Deep Agents imports.
Quick Reference
- →Two patterns: agent-in-sandbox (entire agent runs isolated, use for coding agents) vs tool-in-sandbox (individual tool calls isolated, use for mixed workloads)
- →E2B: Firecracker microVMs, ~80-200ms cold start, Python + JS/TS SDKs, open source, free tier available
- →Modal: gVisor containers, ~1s cold / sub-second warm, GPU support (A100, H100), from langchain_modal import ModalSandbox
- →Daytona: OCI containers, 27-90ms cold start, Deep Agents native via from deepagents.backends.daytona import DaytonaSandbox
- →Deno Sandbox: Firecracker microVMs (NOT V8 isolates), <200ms startup, launched Feb 2026
- →Thread-scoped sandboxes (one per conversation) are the recommended default — files persist across turns, isolated per conversation
- →Isolation posture: Firecracker (separate kernel) > gVisor (userspace kernel) > OCI containers (shared kernel) for untrusted code
- →OpenAI Agents SDK added native sandbox support April 2026 — Daytona, E2B, Modal, Runloop backends
When to Sandbox (and When Not To)
Prompt injection can cause an agent to execute arbitrary shell commands. Without a sandbox, a compromised agent has the same access as the process running it. There is no safe way to run LLM-generated code on your host — if you're doing it, you have a vulnerability.
| Scenario | Sandbox? | Pattern | Why |
|---|---|---|---|
| Agent generates and runs code | Yes | Agent-in-sandbox | Code execution on host = RCE vulnerability |
| Agent searches web, calls APIs | No | — | No code execution; sandbox adds latency with no benefit |
| Agent writes and reads files on behalf of user | Yes | Tool-in-sandbox | Filesystem access needs blast radius containment |
| RAG-only agent (retrieval + generation) | No | — | No side effects; sandbox is unnecessary overhead |
| Multi-tenant: users submit code for execution | Yes | Agent-in-sandbox | Tenant isolation — one user's code must not reach another's files |
| Agent calls external APIs with stored credentials | Credential proxy instead | Tool outside sandbox | Sandboxing credentials solves the wrong problem; use a proxy |
Sandboxes add 27-200ms of cold-start latency per execution depending on provider. For a code-execution agent, that's acceptable overhead for the security you get. For a text-only Q&A agent with no file or code operations, it's wasted latency with no benefit. Sandbox the right things, not everything.