Few-Shot Examples for Agent Tasks

When to use few-shot examples for agents, how to build trajectory examples that teach reasoning patterns, static vs dynamic selection with real cost math, and measuring whether examples actually help.

Quick Reference

→Agent few-shot examples are full trajectories: user query → reasoning → tool call → observation → response, not just input/output pairs
→Start with 2–3 static examples covering your most common and most error-prone scenarios before building dynamic selection
→Use VoyageAIEmbeddings (voyage-4) for dynamic retrieval — Anthropic has no embeddings API; AnthropicEmbeddings does not exist
→Example ordering matters: place the most representative example last (recency bias) and the most diverse example first
→Three trajectory examples at ~600 tokens each add ~1,800 input tokens — small in a 200K window, meaningful in a 16K one
→Curate from production traces: filter for successful completion, 3+ tool calls, no errors, diverse tool sequences
→Measure impact before committing: A/B test with and without examples on an eval dataset; if task completion doesn't improve by >5%, the examples aren't targeting the right failure modes
→Few-shot examples can narrow agent behavior — if the agent copies patterns verbatim instead of reasoning, reduce examples or switch to zero-shot

Should You Use Few-Shot Examples?

Start zero-shot → add static → graduate to dynamic → add eval when stakes demand it

Few-shot examples are not the default tool for every agent. They help when the task involves complex multi-step reasoning, specific tool sequencing, or output formatting that instructions alone struggle to convey. They hurt when examples narrow the agent's behavior to patterns it shouldn't generalize — causing it to copy-paste reasoning or fail on anything outside the demonstrated distribution.

Scenario	Few-shot value	Recommendation
Single-tool lookup, no reasoning chain	Low	Zero-shot — instructions suffice
Multi-step tool chains with domain logic	High	2–3 static trajectory examples
Formatting-critical outputs (structured reports, legal clauses)	High	1–2 static format-focused examples
Agent handles 5+ distinct task categories	High	Dynamic selection (voyage-4 embed)
High-stakes: financial, medical, legal decisions	Very high	Dynamic + A/B eval harness before shipping
Simple conversational agent, low error rate	Low	Zero-shot — examples add complexity without gain

When few-shot examples hurt

Examples can narrow agent behavior through distributional shift: if all your examples involve a single tool or scenario type, the agent will anchor on that pattern even for unrelated queries. They also create instruction-example conflicts when examples show behavior that contradicts the system prompt. And they cause copy-paste behavior where the agent reproduces example text verbatim rather than adapting reasoning to the actual query.

Anatomy of a Trajectory Example

A trajectory example shows all five stages — the middle three are what makes it work for agents

Static Examples: The 80/20 Starting Point

Start here. Static examples are injected into the system prompt and stay fixed across all queries. They cost nothing extra at runtime, add zero latency, and require no additional infrastructure. If you have fewer than 5 distinct task patterns and a stable tool set, static examples will cover your needs.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.