Context Engineering in Deep Agents

Context engineering is the discipline of keeping the right information in the agent's context window at every step. This article breaks down the four strategies Deep Agents provides — filesystem offloading, auto-summarization, subagent isolation, and progressive disclosure — explains the real APIs behind each, and covers the failure modes you will hit in production.

Quick Reference

→Only build context engineering into agents that will run for many turns or call many large tools — short agents don't need it
→Deep Agents assembles a 9-layer system prompt; resting context (always-on layers) typically costs 1,000–8,000 tokens before any conversation starts
→FilesystemMiddleware auto-offloads tool results exceeding 20K tokens to disk; a 10-line preview stays in context
→SummarizationMiddleware triggers at 85% of the model's context window; it keeps 10% of tokens as recent context and fires a fallback on ContextOverflowError
→SummarizationToolMiddleware (deepagents>=1.6.0) adds a compact_conversation tool for manual compaction; enabling it does NOT disable auto-summarization
→Subagents via the task() tool get a fresh context window; they return 1–2K token summaries to the main agent regardless of how much intermediate work they did
→@dynamic_prompt injects context-dependent system prompt content at runtime — user roles, permissions, or preferences — without hardcoding them in the static system prompt
→The four strategies compose: offloading + summarization run automatically; subagents + @dynamic_prompt require explicit wiring

When Context Engineering Matters (and When It Doesn't)

Not every agent needs context engineering. A chatbot that handles 5-turn conversations and one tool call will never approach a context limit. The investment makes sense when your agent will: run for 20+ turns, call tools that return large datasets (search results, file contents, API responses), spawn subagents that do their own extensive work, or maintain state across multiple sessions. The telltale signs you need it: agents that degrade after many turns, agents that 'forget' earlier instructions mid-session, or agents that crash with context overflow errors on large inputs.

The 50% rule

Check your LangSmith traces for context usage per turn. If a typical session uses more than 50% of the model's window, invest in context engineering before you hit the ceiling. At 70%, you're in danger. At 85%, Deep Agents' auto-summarization fires automatically.

What's In the Context Window: The 9-Layer Prompt Assembly

Before any user message arrives, Deep Agents has already assembled a system prompt from up to 9 ordered layers. Understanding this assembly is the foundation of context engineering — you can't manage a budget you can't see.

Strategy 1 — Offloading Large Results to Filesystem

Large tool results → filesystem storage → tiny reference in context (99%+ token savings)

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.