Evaluation & Quality/Automated Evaluation
Intermediate9 min

Eval in CI/CD: Pytest & Vitest Integration

LangSmith's pytest plugin and Vitest/Jest integration bring LLM evaluation into your CI/CD pipeline — with fuzzy matching, embedding distance, test caching, and rich terminal output.

Quick Reference

  • @pytest.mark.langsmith decorator syncs test cases with LangSmith datasets automatically
  • expect() utility provides fuzzy matching: edit distance, embedding similarity, semantic match
  • Test caching skips unchanged examples in CI — only re-evaluates modified tests
  • Rich terminal output shows pass/fail with scores and diffs inline
  • Vitest/Jest integration for JavaScript/TypeScript projects with identical capabilities
  • Results sync to LangSmith for tracking regressions across commits

Why Eval in CI/CD?

LLM outputs are non-deterministic — the same input can produce different outputs across runs. Traditional unit tests with exact string matching break constantly. LangSmith's testing framework solves this with fuzzy assertions (embedding distance, semantic similarity) and statistical evaluation (pass rates across datasets, not individual examples).

Traditional TestsLLM Eval Tests
assert output == 'exact string'expect(output).to_semantic_match('meaning')
Pass/fail binaryScore-based with thresholds
DeterministicStatistical (pass rate across dataset)
Run every timeCache unchanged examples
Local onlySync results to LangSmith