Intermediate9 min
Eval in CI/CD: Pytest & Vitest Integration
LangSmith's pytest plugin and Vitest/Jest integration bring LLM evaluation into your CI/CD pipeline — with fuzzy matching, embedding distance, test caching, and rich terminal output.
Quick Reference
- →@pytest.mark.langsmith decorator syncs test cases with LangSmith datasets automatically
- →expect() utility provides fuzzy matching: edit distance, embedding similarity, semantic match
- →Test caching skips unchanged examples in CI — only re-evaluates modified tests
- →Rich terminal output shows pass/fail with scores and diffs inline
- →Vitest/Jest integration for JavaScript/TypeScript projects with identical capabilities
- →Results sync to LangSmith for tracking regressions across commits
Why Eval in CI/CD?
LLM outputs are non-deterministic — the same input can produce different outputs across runs. Traditional unit tests with exact string matching break constantly. LangSmith's testing framework solves this with fuzzy assertions (embedding distance, semantic similarity) and statistical evaluation (pass rates across datasets, not individual examples).
| Traditional Tests | LLM Eval Tests |
|---|---|
| assert output == 'exact string' | expect(output).to_semantic_match('meaning') |
| Pass/fail binary | Score-based with thresholds |
| Deterministic | Statistical (pass rate across dataset) |
| Run every time | Cache unchanged examples |
| Local only | Sync results to LangSmith |