Intro

This is a work in progress. It’s just an outline for now.

Thresholds enable you to signal failure at different severity levels. In the near future, thresholds will be able to trigger custom actions. For example, when testing a column for NULLs with col_vals_not_null() you might want to warn on any NULLs and stop where there are 10% NULLs in the column.

import pointblank as pb

validation_1 = (
    pb.Validate(data=pb.load_dataset(dataset="small_table"))
    .col_vals_not_null(columns="a", thresholds=(1, 0.1))
    .interrogate()
)

validation_1

The code uses thresholds=(1, 0.1) to set a WARN threshold of 1 and a STOP threshold of 10% failing test units. Notice these pieces in the validation table:

The one final threshold, N (NOTIFY), wasn’t set so appears on the validation table as a dash.

Using the Validation(threshold=) argument

We can also define thresholds globally. This means that every validation step will re-use the same set of threshold values.

import pointblank as pb

validation_2 = (
    pb.Validate(data=pb.load_dataset(dataset="small_table"), thresholds=(1, 0.1))
    .col_vals_not_null(columns="a")
    .col_vals_gt(columns="b", value=2)
    .interrogate()
)

validation_2

In this, both the col_vals_not_null() and col_vals_gt() steps will use the thresholds set in the Validate() call. The threshold= argument to every validation method will take priority over the globally set value.

Defining Thresholds

Threshold shorthands

The fastest way to define a threshold is a use a tuple with entries for WARN, STOP, and NOTIFY levels.

# [WARN, STOP, NOTIFY]
threshold = (1, 2, 3)

Validate(data=..., threshold=threshold)

Note that a shorter tuple or even single values are also allowed:

  • (1, 2): warn on 1 failure, stop on 2
  • 1 or (1, ): warn on 1 failure

Threshold cutoff values

Threshold values can be specified in two ways:

  • percentage: a decimal value like 0.1 to mean 10% test units failed
  • number: a fixed number of test units failed

Threshold cutoffs are inclusive so any value of failing test units greater than or equal to the cutoff will result in triggering the threshold. So if a threshold is defined with a cutoff value of 5, then 5 failing test units will result in threshold.

The Threshold class

Triggering Actions

This is not currently implemented.

Use Case: Stopping on any Failures

Use Case: Global tolerance bands

Use Case: Schema Correctness After Table Joins