import pointblank as pb
import polars as pl
= pl.DataFrame(
tbl
{"a": [1, 2, 1, 7, 8, 6],
"b": [0, 0, 0, 1, 1, 1],
"c": [0.5, 0.3, 0.8, 1.4, 1.9, 1.2],
}
)
pb.preview(tbl)
Validate.col_vals_expr
=None, thresholds=None, active=True) Validate.col_vals_expr(expr, pre
Validate column values using a custom expression.
The col_vals_expr()
validation method checks whether column values in a table satisfy a custom expr=
expression. This validation will operate over the number of test units that is equal to the number of rows in the table (determined after any pre=
mutation has been applied).
Parameters
expr : any
-
A column expression that will evaluate each row in the table, returning a boolean value per table row. If the target table is a Polars DataFrame, the expression should either be a Polars column expression or a Narwhals one. For a Pandas DataFrame, the expression should either be a lambda expression or a Narwhals column expression.
pre : Callable | None = None
-
A pre-processing function or lambda to apply to the data table for the validation step.
thresholds : int | float | bool | tuple | dict | Thresholds = None
-
Failure threshold levels so that the validation step can react accordingly when exceeding the set levels for different states (
warn
,stop
, andnotify
). This can be created simply as an integer or float denoting the absolute number or fraction of failing test units for the ‘warn’ level. Otherwise, you can use a tuple of 1-3 values, a dictionary of 1-3 entries, or a Thresholds object. active : bool = True
-
A boolean value indicating whether the validation step should be active. Using
False
will make the validation step inactive (still reporting its presence and keeping indexes for the steps unchanged).
Returns
: Validate
-
The
Validate
object with the added validation step.
Examples
For the examples here, we’ll use a simple Polars DataFrame with three columns (a
, b
, and c
). The table is shown below:
Let’s validate that the values in column a
are all integers. We’ll determine if this validation had any failing test units (there are six test units, one for each row).
= (
validation =tbl)
pb.Validate(data=pl.col("a") % 1 == 0)
.col_vals_expr(expr
.interrogate()
)
validation
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#4CA64C | 1 |
|
✓ | 6 | 6 1.00 |
0 0.00 |
— | — | — | — |
Printing the validation
object shows the validation table in an HTML viewing environment. The validation table shows the single entry that corresponds to the validation step created by using col_vals_expr()
. All test units passed, with no failing test units.