Agent Architecture/Workflow Patterns
Intermediate8 min

Parallelization: Concurrent LLM Execution

Multiple LLMs work simultaneously on the same input, with results aggregated via voting, merging, or selection — for speed or confidence.

Quick Reference

  • Fan-out: send the same input to multiple LLMs concurrently via the Send API
  • Fan-in: aggregate results using voting, merging, or best-of-N selection
  • Two use cases: speed (split work) and confidence (multiple opinions on same task)
  • Voting pattern: N models score independently, majority or average wins
  • Sectioning pattern: split input into independent chunks, process in parallel
  • Use reducers (operator.add) to collect parallel results into shared state

Two Flavors of Parallelization

Same InputSend APIVoter Atemp=0.3Voter Btemp=0.7Voter Ctemp=1.0AggregateMajority vote / weighted avgFinal Result

Same input → multiple parallel workers (different temps) → aggregate via voting

FlavorSame Input?GoalExample
VotingYes — all see same inputConfidence / accuracy3 models classify a support ticket independently
SectioningNo — each sees a different chunkSpeed / throughputSplit a 100-page doc into 10 chunks, summarize each in parallel

Voting gives you confidence through redundancy — if 3 out of 4 models agree on a classification, you can trust it more than a single model's answer. Sectioning gives you speed — processing 10 chunks in parallel is ~10x faster than sequentially.