Intermediate9 min
Embedding Models
The Embeddings interface, embed_documents(), embed_query(), choosing a model (text-embedding-3-small vs large), dimensionality.
Quick Reference
- →Embeddings is the base interface — embed_documents() for bulk, embed_query() for single queries
- →embed_documents() and embed_query() may use different prompts internally (e.g., 'search_document:' vs 'search_query:')
- →text-embedding-3-small is the cost-effective default; text-embedding-3-large for higher accuracy
- →Dimensionality can be reduced via the dimensions parameter — smaller vectors, lower storage cost
- →Provider packages: langchain-openai (OpenAIEmbeddings), langchain-cohere (CohereEmbeddings), etc.
The Embeddings Interface
Two methods: embed_documents() for corpus, embed_query() for search
Use the right method
Use embed_documents() for your corpus and embed_query() for search queries. Some models process them differently — they prepend different prefixes internally (e.g., 'search_document:' vs 'search_query:'). Mixing them up degrades retrieval quality.