Intro
Thresholds enable you to signal failure at different severity levels. In the near future, thresholds will be able to trigger custom actions. For example, when testing a column for NULLs with col_vals_not_null()
you might want to warn on any NULLs and stop where there are 10% NULLs in the column.
import pointblank as pb
= (
validation_1 =pb.load_dataset(dataset="small_table"))
pb.Validate(data="a", thresholds=(1, 0.1))
.col_vals_not_null(columns
.interrogate()
)
validation_1
The code uses thresholds=(1, 0.1)
to set a WARN
threshold of 1 and a STOP
threshold of 10% failing test units. Notice these pieces in the validation table:
- The
FAIL
column shows that x tests units have failed - The
W
column (short forWARN
) shows a filled yellow circle indicating it’s reached threshold - The
S
column (short forSTOP
) shows an open red circle indicating it’s below threshold
The one final threshold, N
(NOTIFY
), wasn’t set so appears on the validation table as a dash.
Using the Validation(threshold=)
argument
We can also define thresholds globally. This means that every validation step will re-use the same set of threshold values.
import pointblank as pb
= (
validation_2 =pb.load_dataset(dataset="small_table"), thresholds=(1, 0.1))
pb.Validate(data="a")
.col_vals_not_null(columns="b", value=2)
.col_vals_gt(columns
.interrogate()
)
validation_2
In this, both the col_vals_not_null()
and col_vals_gt()
steps will use the thresholds set in the Validate()
call. The threshold=
argument to every validation method will take priority over the globally set value.
Defining Thresholds
Threshold shorthands
The fastest way to define a threshold is a use a tuple with entries for WARN, STOP, and NOTIFY levels.
# [WARN, STOP, NOTIFY]
= (1, 2, 3)
threshold
=..., threshold=threshold) Validate(data
Note that a shorter tuple or even single values are also allowed:
(1, 2)
: warn on 1 failure, stop on 21
or(1, )
: warn on 1 failure
Threshold cutoff values
Threshold values can be specified in two ways:
- percentage: a decimal value like 0.1 to mean 10% test units failed
- number: a fixed number of test units failed
Threshold cutoffs are inclusive so any value of failing test units greater than or equal to the cutoff will result in triggering the threshold. So if a threshold is defined with a cutoff value of 5
, then 5 failing test units will result in threshold.
The Threshold
class
Triggering Actions
This is not currently implemented.