Advanced16 min

Context Engineering in Agents

Context engineering is the discipline of curating the smallest set of high-signal tokens that maximize the probability of a good outcome. In LangChain, this means deciding what goes into every model call, how tools read and write state, and what happens between steps — using State, Store, and Runtime Context as your three levers.

Quick Reference

→Context engineering = smallest possible set of high-signal tokens that maximize the probability of a good outcome
→Four strategies: Write (inject), Select (retrieve), Compress (summarize), Isolate (sub-agents)
→Three context types: Model Context (transient, per-call), Tool Context (persistent), Life-cycle Context (between steps)
→Three data sources: State (conversation-scoped), Store (cross-session), Runtime Context (deploy-time config, read-only)
→Middleware hook order: before_model runs top→bottom, after_model runs bottom→top
→SummarizationMiddleware uses message_threshold (not token triggers) — verify before deploying
→Context rot is real: model accuracy degrades as context grows even within the window limit

What Is Context Engineering

Context engineering is the set of strategies for curating and maintaining the optimal set of tokens during LLM inference. It's broader than prompt engineering, which focuses on how to write a system prompt. Context engineering includes everything that reaches the model: the system prompt, conversation history, tool schemas, retrieved documents, injected data, and what gets compressed or discarded. The goal is not the biggest context — it's the most useful context.

Why agents fail

Most agent failures are not model failures — they're context failures. The model saw the wrong information, too much irrelevant information, or information formatted in a way it couldn't act on. Engineering the context fixes these failures; upgrading the model usually doesn't.

Anthropic's engineering team distills context engineering into four moves, each targeting a different root cause of context bloat or context rot. The existing `context-strategies` diagram covers these well:

4 strategies for managing the context window

Strategy	What It Does	When to Use It
Write	Inject prompts, tools, few-shot examples into context	Every call — this is the baseline
Select	Retrieve only what's relevant via vector search or reranking	When the knowledge base is larger than what fits in context
Compress	Summarize or trim conversation history to reduce token count	Long-running agents, conversations > 20 turns
Isolate	Delegate subtasks to sub-agents with clean context windows	Parallel workstreams, tasks requiring fresh perspective

When Not to Engineer Context

Every middleware hook adds latency, a failure surface, and maintenance cost. The right answer is often a static system prompt with no middleware at all.

The Context Budget

A context budget is the allocation of your model's context window across its competing inputs. Without an explicit budget, conversation history expands to fill whatever space is left, crowding out retrieved documents and leaving the model no room to respond. The `context-budget` diagram shows a typical allocation for a RAG-heavy agent:

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.