Few-Shot Examples for Agent Tasks
When to use few-shot examples for agents, how to build trajectory examples that teach reasoning patterns, static vs dynamic selection with real cost math, and measuring whether examples actually help.
Quick Reference
- →Agent few-shot examples are full trajectories: user query → reasoning → tool call → observation → response, not just input/output pairs
- →Start with 2–3 static examples covering your most common and most error-prone scenarios before building dynamic selection
- →Use VoyageAIEmbeddings (voyage-4) for dynamic retrieval — Anthropic has no embeddings API; AnthropicEmbeddings does not exist
- →Example ordering matters: place the most representative example last (recency bias) and the most diverse example first
- →Three trajectory examples at ~600 tokens each add ~1,800 input tokens — small in a 200K window, meaningful in a 16K one
- →Curate from production traces: filter for successful completion, 3+ tool calls, no errors, diverse tool sequences
- →Measure impact before committing: A/B test with and without examples on an eval dataset; if task completion doesn't improve by >5%, the examples aren't targeting the right failure modes
- →Few-shot examples can narrow agent behavior — if the agent copies patterns verbatim instead of reasoning, reduce examples or switch to zero-shot
Should You Use Few-Shot Examples?
Start zero-shot → add static → graduate to dynamic → add eval when stakes demand it
Few-shot examples are not the default tool for every agent. They help when the task involves complex multi-step reasoning, specific tool sequencing, or output formatting that instructions alone struggle to convey. They hurt when examples narrow the agent's behavior to patterns it shouldn't generalize — causing it to copy-paste reasoning or fail on anything outside the demonstrated distribution.
| Scenario | Few-shot value | Recommendation |
|---|---|---|
| Single-tool lookup, no reasoning chain | Low | Zero-shot — instructions suffice |
| Multi-step tool chains with domain logic | High | 2–3 static trajectory examples |
| Formatting-critical outputs (structured reports, legal clauses) | High | 1–2 static format-focused examples |
| Agent handles 5+ distinct task categories | High | Dynamic selection (voyage-4 embed) |
| High-stakes: financial, medical, legal decisions | Very high | Dynamic + A/B eval harness before shipping |
| Simple conversational agent, low error rate | Low | Zero-shot — examples add complexity without gain |
Examples can narrow agent behavior through distributional shift: if all your examples involve a single tool or scenario type, the agent will anchor on that pattern even for unrelated queries. They also create instruction-example conflicts when examples show behavior that contradicts the system prompt. And they cause copy-paste behavior where the agent reproduces example text verbatim rather than adapting reasoning to the actual query.