EvalResults.regressions()

Detect regressions between variants.

Usage

Source

EvalResults.regressions(
    baseline=None,
    threshold=0.05,
)

Compares each variant to the baseline and returns dimensions where the score dropped by more than threshold.

Parameters

baseline: str | None = None

Variant name to use as baseline. Defaults to the first variant.

threshold: float = 0.05
Minimum score drop to flag as a regression.

Returns

dict[str, dict[str, float]]
Mapping of variant name -> dimension -> score delta (negative = regression).