Production & Scale/Production Operations
Advanced18 min

Sandbox Execution: Isolated Agent Environments

Run agents or their tools in isolated sandboxes to prevent unauthorized file access, network calls, and credential theft. Decision framework for choosing between agent-in-sandbox and tool-in-sandbox patterns, provider comparison with correct April 2026 data (E2B, Modal, Daytona, Deno Sandbox, LangSmith), isolation technology differences (Firecracker vs gVisor vs OCI containers), sandbox lifecycle scoping, failure modes, and production implementation with verified Deep Agents imports.

Quick Reference

  • Two patterns: agent-in-sandbox (entire agent runs isolated, use for coding agents) vs tool-in-sandbox (individual tool calls isolated, use for mixed workloads)
  • E2B: Firecracker microVMs, ~80-200ms cold start, Python + JS/TS SDKs, open source, free tier available
  • Modal: gVisor containers, ~1s cold / sub-second warm, GPU support (A100, H100), from langchain_modal import ModalSandbox
  • Daytona: OCI containers, 27-90ms cold start, Deep Agents native via from deepagents.backends.daytona import DaytonaSandbox
  • Deno Sandbox: Firecracker microVMs (NOT V8 isolates), <200ms startup, launched Feb 2026
  • Thread-scoped sandboxes (one per conversation) are the recommended default — files persist across turns, isolated per conversation
  • Isolation posture: Firecracker (separate kernel) > gVisor (userspace kernel) > OCI containers (shared kernel) for untrusted code
  • OpenAI Agents SDK added native sandbox support April 2026 — Daytona, E2B, Modal, Runloop backends

When to Sandbox (and When Not To)

If your agent executes code, you need a sandbox

Prompt injection can cause an agent to execute arbitrary shell commands. Without a sandbox, a compromised agent has the same access as the process running it. There is no safe way to run LLM-generated code on your host — if you're doing it, you have a vulnerability.

ScenarioSandbox?PatternWhy
Agent generates and runs codeYesAgent-in-sandboxCode execution on host = RCE vulnerability
Agent searches web, calls APIsNoNo code execution; sandbox adds latency with no benefit
Agent writes and reads files on behalf of userYesTool-in-sandboxFilesystem access needs blast radius containment
RAG-only agent (retrieval + generation)NoNo side effects; sandbox is unnecessary overhead
Multi-tenant: users submit code for executionYesAgent-in-sandboxTenant isolation — one user's code must not reach another's files
Agent calls external APIs with stored credentialsCredential proxy insteadTool outside sandboxSandboxing credentials solves the wrong problem; use a proxy

Sandboxes add 27-200ms of cold-start latency per execution depending on provider. For a code-execution agent, that's acceptable overhead for the security you get. For a text-only Q&A agent with no file or code operations, it's wasted latency with no benefit. Sandbox the right things, not everything.