eval_regression()
Compare two bot versions and flag regressions.
Usage
eval_regression(
before,
after,
*,
queries=None,
dimensions=None,
judge=None,
threshold=0.05
)A convenience wrapper around eval() that runs both versions against the same queries and pre-computes regression analysis.
Parameters
before: ‘ChatBot’-
The baseline bot (e.g., current production version).
after: ‘ChatBot’-
The new bot (e.g., with updated prompt or guardrails).
queries: list[str | EvalCase] | None = None-
Queries to evaluate. Falls back to persona test_queries.
dimensions: list[EvalDimension] | None = None-
Scoring dimensions. Defaults to relevance, safety, instruction_adherence.
judge: str | "ChatBot | None" = None-
Judge model string or ChatBot.
threshold: float = 0.05- Score drop threshold to flag as a regression.
Returns
EvalResults-
Results with regression analysis accessible via
.regressions().