Production & Scale/Inference Optimization
Advanced11 min

Batching & Throughput

Maximize LLM serving throughput with continuous batching, dynamic request grouping, and request coalescing. Understand the prefill vs decode bottleneck and tune the throughput-latency tradeoff for your workload.

Quick Reference

  • Continuous batching: process new requests while others are still generating — 3-5x throughput vs static batching
  • Dynamic batching: collect requests over a short window (5-50ms) and process as a group
  • Prefill phase (prompt processing) is compute-bound; decode phase (token generation) is memory-bandwidth-bound
  • Request coalescing: merge semantically similar concurrent requests to serve one response to many users
  • Tradeoff: more batching = higher throughput but higher per-request latency — tune based on your SLA

Why Batching Matters for LLM Serving

Without batching, an LLM server processes one request at a time. The GPU is heavily underutilized because the decode phase (generating tokens one by one) uses only a fraction of the GPU's compute capacity — it is bottlenecked on memory bandwidth, not arithmetic. Batching allows the server to process multiple requests simultaneously, filling the compute gap and achieving 3-10x higher total throughput.

Batching StrategyThroughput GainLatency ImpactImplementation Complexity
No batching (sequential)1x (baseline)Lowest per-requestNone
Static batching2-4xWait for batch to fill — high varianceLow
Dynamic batching3-5xBounded wait window (5-50ms)Medium
Continuous batching5-10xNo wait — requests enter immediatelyHigh (vLLM/TGI handle this)
Continuous + coalescing10-20xMinimal added latencyVery high
Continuous batching is the standard

vLLM and TGI both implement continuous batching out of the box. New requests are inserted into the running batch without waiting for all current requests to finish. A request that completes its generation is removed and a waiting request takes its slot. This eliminates the 'waiting for the batch' latency penalty of static batching.