Intermediate12 min

Batch Processing

batch() parallelizes LLM calls client-side — all requests fire concurrently, results return together. When you need 50%+ cost savings and can tolerate ~1h latency, use your provider's async batch API instead. This article shows you which to pick, how to handle partial failures, and what a production pipeline looks like.

Quick Reference

→model.batch([...inputs]) fires all requests concurrently and blocks until all complete
→batch_as_completed() yields (index, result) as each request finishes — results arrive out of order
→abatch() / abatch_as_completed() are the async variants — use these in FastAPI / async apps
→return_exceptions=True collects failures as exceptions instead of raising on the first error
→max_concurrency in config caps parallel calls — tune to stay within provider rate limits
→Provider batch APIs (Anthropic, OpenAI) offer 50% off and complete most batches in ~1h
→InMemoryRateLimiter on the model gives token-level control independent of concurrency cap

When to Use Batch Processing

You have N independent inputs — classifying 500 documents, translating 200 strings, summarizing 100 articles. The question isn't whether to parallelize; it's which parallelism to use. Three options exist, each with a different cost/latency tradeoff.

Option	Latency	Cost	When to use
batch() / abatch()	Seconds	1× standard rate	Need results now, any batch size up to ~1000
Provider batch API	~1h typical, 24h max	0.5× (50% off)	Latency-tolerant offline jobs, large volumes
asyncio.gather() (raw)	Seconds	1× standard rate	Already in async code, don't want LangChain overhead

Start with latency need → then check sync vs async context

When NOT to use batch()

batch() is for independent inputs only. If input B depends on the result of input A, you need sequential calls or a chain. Also avoid batch() for streaming UX — a user waiting for a response should use stream(), not batch().

Client-Side vs Provider Batch APIs

LangChain's batch() sends N concurrent requests to the provider's standard API — you pay standard rates and get results in seconds. Provider batch APIs (Anthropic Message Batches, OpenAI /v1/batches) are different products: you submit a job file, the provider processes it at off-peak capacity, and you poll for results. The tradeoff is real and worth computing.

Basic batch()

batch() is a method on any LangChain Runnable — models, chains, prompts. It fires all requests concurrently using a thread pool and returns when all complete.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.