col

col(exprs)

Helper function for referencing a column in the input table.

Many of the validation methods (i.e., col_vals_*() methods) in pointblank have a value= argument. These validations are comparisons between column values and a literal value, or, between column values and adjacent values in another column. The col() helper function is used to specify that it is a column being referenced, not a literal value.

The col() doesn’t check that the column exists in the input table. It acts to signal that the value being compared is a column value. During validation (i.e., when interrogate() is called), pointblank will then check that the column exists in the input table.

Parameters

exprs : str | ColumnSelector

Either the name of a single column in the target table, provided as a string, or, an expression involving column selector functions (e.g., starts_with("a"), ends_with("e") \| starts_with("a"), etc.). Please read the documentation for further details on which input forms are valid depending on the context.

Returns

: Column

A Column object representing the column.

Usage with the columns= Argument

The col() function can be used in the columns= argument of the following validation methods:

  • col_vals_gt()
  • col_vals_lt()
  • col_vals_ge()
  • col_vals_le()
  • col_vals_eq()
  • col_vals_ne()
  • col_vals_between()
  • col_vals_outside()
  • col_vals_in_set()
  • col_vals_not_in_set()
  • col_vals_null()
  • col_vals_not_null()
  • col_vals_regex()
  • col_exists()

If specifying a single column with certainty (you have the exact name), col() is not necessary since you can just pass the column name as a string (though it is still valid to use col("column_name"), if preferred). However, if you want to select columns based on complex logic involving multiple column selector functions (e.g., columns that start with "a" but don’t end with "e"), you need to use col() to wrap expressions involving column selector functions and logical operators such as &, |, -, and ~.

Here is an example of such usage with the col_vals_gt() validation method:

col_vals_gt(columns=col(starts_with("a") & ~ends_with("e")), value=10)

If using only a single column selector function, you can pass the function directly to the columns= argument of the validation method, or, you can use col() to wrap the function (either is valid though the first is more concise). Here is an example of that simpler usage:

col_vals_gt(columns=starts_with("a"), value=10)

Usage with the value=, left=, and right= Arguments

The col() function can be used in the value= argument of the following validation methods

  • col_vals_gt()
  • col_vals_lt()
  • col_vals_ge()
  • col_vals_le()
  • col_vals_eq()
  • col_vals_ne()

and in the left= and right= arguments (either or both) of these two validation methods

  • col_vals_between()
  • col_vals_outside()

You cannot use column selector functions such as starts_with() in either of the value=, left=, or right= arguments since there would be no guarantee that a single column will be resolved from the target table with this approach. The col() function is used to signal that the value being compared is a column value and not a literal value.

Examples

Suppose we have a table with columns a and b and we’d like to validate that the values in column a are greater than the values in column b. We can use the col() helper function to reference the comparison column when creating the validation step.

import pointblank as pb
import polars as pl

tbl = pl.DataFrame(
    {
        "a": [5, 6, 5, 7, 6, 5],
        "b": [4, 2, 3, 3, 4, 3],
    }
)

validation = (
    pb.Validate(data=tbl)
    .col_vals_gt(columns="a", value=pb.col("b"))
    .interrogate()
)

validation
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
#4CA64C 1
col_vals_gt
col_vals_gt()
a b 6 6
1.00
0
0.00

From results of the validation table it can be seen that values in a were greater than values in b for every row (or test unit). Using value=pb.col("b") specified that the greater-than comparison is across columns, not with a fixed literal value.