Validate.warn

Validate.warn(i=None, scalar=False)

Provides a dictionary of the warning status for each validation step.

The warning status (warn) for a validation step is True if the fraction of failing test units meets or exceeds the threshold for the warning level. Otherwise, the status is False.

The ascribed name of warn is semantic and does not imply that a warning message is generated, it is simply a status indicator that could be used to trigger a warning message. Here’s how it fits in with other status indicators:

This method provides a dictionary of the warning status for each validation step. If the scalar=True argument is provided and i= is a scalar, the value is returned as a scalar instead of a dictionary.

Parameters

i : int | list[int] | None = None

The validation step number(s) from which the warning status is obtained. Can be provided as a list of integers or a single integer. If None, all steps are included.

scalar : bool = False

If True and i= is a scalar, return the value as a scalar instead of a dictionary.

Returns

: dict[int, bool] | bool

A dictionary of the warning status for each validation step or a scalar value.

Examples

In the example below, we’ll use a simple Polars DataFrame with three columns (a, b, and c). There will be three validation steps, and the first step will have some failing test units, the rest will be completely passing. We’ve set thresholds here for each of the steps by using thresholds=(2, 4, 5), which means:

  • the warn threshold is 2 failing test units
  • the stop threshold is 4 failing test units
  • the notify threshold is 5 failing test units

After interrogation, the warn() method is used to determine the warn status for each validation step.

import polars as pl
import pointblank as pb

tbl = pl.DataFrame(
    {
        "a": [7, 4, 9, 7, 12, 3, 10],
        "b": [9, 8, 10, 5, 10, 6, 2],
        "c": ["a", "b", "a", "a", "b", "b", "a"]
    }
)

validation = (
    pb.Validate(data=tbl, thresholds=(2, 4, 5))
    .col_vals_gt(columns="a", value=5)
    .col_vals_lt(columns="b", value=15)
    .col_vals_in_set(columns="c", set=["a", "b"])
    .interrogate()
)

validation.warn()
{1: True, 2: False, 3: False}

The returned dictionary provides the warn status for each validation step. The first step has a True value since the number of failing test units meets the threshold for the warn level. The second and third steps have False values since the number of failing test units was 0, which is below the threshold for the warn level.

We can also visually inspect the warn status across all steps by viewing the validation table:

validation
Pointblank Validation
2024-12-20|15:09:32
PolarsWARN2STOP4NOTIFY5
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
#FFBF00 1
col_vals_gt
col_vals_gt()
a 5 7 5
0.71
2
0.29
#4CA64C 2
col_vals_lt
col_vals_lt()
b 15 7 7
1.00
0
0.00
#4CA64C 3
col_vals_in_set
col_vals_in_set()
c a, b 7 7
1.00
0
0.00
2024-12-20 15:09:32 UTC< 1 s2024-12-20 15:09:32 UTC

We can see that there’s a filled yellow circle in the first step (far right side, in the W column) indicating that the warn threshold was met. The other steps have empty yellow circles. This means that thresholds were ‘set but not met’ in those steps.

If we wanted to check the warn status for a single validation step, we can provide the step number. Also, we could have the value returned as a scalar by setting scalar=True (ensuring that i= is a scalar).

validation.warn(i=1)
{1: True}

The returned value is True, indicating that the first validation step had the warn threshold met.