Intermediate8 min
Parallelization: Concurrent LLM Execution
Multiple LLMs work simultaneously on the same input, with results aggregated via voting, merging, or selection — for speed or confidence.
Quick Reference
- →Fan-out: send the same input to multiple LLMs concurrently via the Send API
- →Fan-in: aggregate results using voting, merging, or best-of-N selection
- →Two use cases: speed (split work) and confidence (multiple opinions on same task)
- →Voting pattern: N models score independently, majority or average wins
- →Sectioning pattern: split input into independent chunks, process in parallel
- →Use reducers (operator.add) to collect parallel results into shared state
Two Flavors of Parallelization
Same input → multiple parallel workers (different temps) → aggregate via voting
| Flavor | Same Input? | Goal | Example |
|---|---|---|---|
| Voting | Yes — all see same input | Confidence / accuracy | 3 models classify a support ticket independently |
| Sectioning | No — each sees a different chunk | Speed / throughput | Split a 100-page doc into 10 chunks, summarize each in parallel |
Voting gives you confidence through redundancy — if 3 out of 4 models agree on a classification, you can trust it more than a single model's answer. Sectioning gives you speed — processing 10 chunks in parallel is ~10x faster than sequentially.