EvalResults.regressions()
Detect regressions between variants.
Usage
EvalResults.regressions(
baseline=None,
threshold=0.05,
)Compares each variant to the baseline and returns dimensions where the score dropped by more than threshold.
Parameters
baseline: str | None = None-
Variant name to use as baseline. Defaults to the first variant.
threshold: float = 0.05- Minimum score drop to flag as a regression.
Returns
dict[str, dict[str, float]]- Mapping of variant name -> dimension -> score delta (negative = regression).