Evaluations - Laminar documentation

Evaluations are the offline testing layer for your agents and LLM pipelines. Define inputs, run them through the version you want to test, score the outputs, and compare across runs to see if anything regressed.

Introduction

What evaluations are and when to use them.

Quickstart

Write and run your first evaluation in five minutes.

Concepts

Datapoints, executors, evaluators, and how they map to traces.

Compare runs

Groups, the progression chart, and side-by-side comparison.

Datasets

Back your evaluation with a Laminar dataset instead of hardcoded data.

Manual API

Lower-level control for pipelines where evaluate() is too opinionated.

Self-hosted

Point the SDK at a self-hosted Laminar instance.

Next: turn failing traces into reusable evaluation data with Datasets, or catch regressions in production with Signals.

Documentation Index