eval_regression()

Compare two bot versions and flag regressions.

Usage

Source

eval_regression(
    before,
    after,
    *,
    queries=None,
    dimensions=None,
    judge=None,
    threshold=0.05
)

A convenience wrapper around eval() that runs both versions against the same queries and pre-computes regression analysis.

Parameters

before: ‘ChatBot’

The baseline bot (e.g., current production version).

after: ‘ChatBot’

The new bot (e.g., with updated prompt or guardrails).

queries: list[str | EvalCase] | None = None

Queries to evaluate. Falls back to persona test_queries.

dimensions: list[EvalDimension] | None = None

Scoring dimensions. Defaults to relevance, safety, instruction_adherence.

judge: str | "ChatBot | None" = None

Judge model string or ChatBot.

threshold: float = 0.05
Score drop threshold to flag as a regression.

Returns

EvalResults
Results with regression analysis accessible via .regressions().