Agent Architecture/Prompt Engineering for Agents
Intermediate15 min

Few-Shot Examples for Agent Tasks

When to use few-shot examples for agents, how to build trajectory examples that teach reasoning patterns, static vs dynamic selection with real cost math, and measuring whether examples actually help.

Quick Reference

  • Agent few-shot examples are full trajectories: user query → reasoning → tool call → observation → response, not just input/output pairs
  • Start with 2–3 static examples covering your most common and most error-prone scenarios before building dynamic selection
  • Use VoyageAIEmbeddings (voyage-4) for dynamic retrieval — Anthropic has no embeddings API; AnthropicEmbeddings does not exist
  • Example ordering matters: place the most representative example last (recency bias) and the most diverse example first
  • Three trajectory examples at ~600 tokens each add ~1,800 input tokens — small in a 200K window, meaningful in a 16K one
  • Curate from production traces: filter for successful completion, 3+ tool calls, no errors, diverse tool sequences
  • Measure impact before committing: A/B test with and without examples on an eval dataset; if task completion doesn't improve by >5%, the examples aren't targeting the right failure modes
  • Few-shot examples can narrow agent behavior — if the agent copies patterns verbatim instead of reasoning, reduce examples or switch to zero-shot

Should You Use Few-Shot Examples?

Complex multi-steptask with tools?NoZero-shotis fineYesMore than 5 distincttask patterns?NoStatic examples2–3 trajectoriesin system promptYesHigh-stakes orcompliance-sensitive?NoDynamicselectionvoyage-4 embedYesDynamic + eval harnessA/B test before shipping

Start zero-shot → add static → graduate to dynamic → add eval when stakes demand it

Few-shot examples are not the default tool for every agent. They help when the task involves complex multi-step reasoning, specific tool sequencing, or output formatting that instructions alone struggle to convey. They hurt when examples narrow the agent's behavior to patterns it shouldn't generalize — causing it to copy-paste reasoning or fail on anything outside the demonstrated distribution.

ScenarioFew-shot valueRecommendation
Single-tool lookup, no reasoning chainLowZero-shot — instructions suffice
Multi-step tool chains with domain logicHigh2–3 static trajectory examples
Formatting-critical outputs (structured reports, legal clauses)High1–2 static format-focused examples
Agent handles 5+ distinct task categoriesHighDynamic selection (voyage-4 embed)
High-stakes: financial, medical, legal decisionsVery highDynamic + A/B eval harness before shipping
Simple conversational agent, low error rateLowZero-shot — examples add complexity without gain
When few-shot examples hurt

Examples can narrow agent behavior through distributional shift: if all your examples involve a single tool or scenario type, the agent will anchor on that pattern even for unrelated queries. They also create instruction-example conflicts when examples show behavior that contradicts the system prompt. And they cause copy-paste behavior where the agent reproduces example text verbatim rather than adapting reasoning to the actual query.