Planning and Prep

Should you need to understand your data a bit more, use the scan_data() function. It provides a comprehensive report on the data you might be planning on validating. If going down the road of preparing a data quality analysis, call on an agent to carry out that eventual plan with create_agent(). The agent here is to be told which table is the target, and, we devise a stepwise validation plan with validation functions. If collecting and publishing table information is your goal, then begin with the create_informant() function.


Thoroughly scan a table to better understand it


Create a pointblank agent object


Create a pointblank informant object


Modify pointblank validation testing options within R Markdown documents

action_levels() warn_on_fail() stop_on_fail()

Set action levels: failure thresholds and functions to invoke


Get a table from a database


Get a table from a local or remote file


Define a store of tables with table-prep formulas: a table store


Obtain a table-prep formula from a table store


Obtain a materialized table via a table store


Draft a starter pointblank validation .R/.Rmd file with a data table

Validation, Expectation, and Test Functions

Validation steps are either used with an agent object or, more simply, just with the table of interest. When used with an agent, each step function called works to build up a validation plan (which is executed with the interrogate() function). If used directly on data (that is, no agent is involved whatsoever), then the data is checked and passed through. However, if we (1) set warn_* and/or stop_* thresholds and (2) the level of failed validation units exceeds those set levels, then we’ll get either a warning or a stop. Each validation step function is assoicated with an expectation function (of the form expect_*()). These expectation functions are equivalent in usage and behavior to those in the testthat package. Finallly, each validation function has an associated test function (of the form test_*()) which always returns a logical value.

col_vals_lt() expect_col_vals_lt() test_col_vals_lt()

Are column data less than a fixed value or data in another column?

col_vals_lte() expect_col_vals_lte() test_col_vals_lte()

Are column data less than or equal to a fixed value or data in another column?

col_vals_equal() expect_col_vals_equal() test_col_vals_equal()

Are column data equal to a fixed value or data in another column?

col_vals_not_equal() expect_col_vals_not_equal() test_col_vals_not_equal()

Are column data not equal to a fixed value or data in another column?

col_vals_gte() expect_col_vals_gte() test_col_vals_gte()

Are column data greater than or equal to a fixed value or data in another column?

col_vals_gt() expect_col_vals_gt() test_col_vals_gt()

Are column data greater than a fixed value or data in another column?

col_vals_between() expect_col_vals_between() test_col_vals_between()

Do column data lie between two specified values or data in other columns?

col_vals_not_between() expect_col_vals_not_between() test_col_vals_not_between()

Do column data lie outside of two specified values or data in other columns?

col_vals_in_set() expect_col_vals_in_set() test_col_vals_in_set()

Are column data part of a specified set of values?

col_vals_not_in_set() expect_col_vals_not_in_set() test_col_vals_not_in_set()

Are data not part of a specified set of values?

col_vals_make_set() expect_col_vals_make_set() test_col_vals_make_set()

Is a set of values entirely accounted for in a column of values?

col_vals_make_subset() expect_col_vals_make_subset() test_col_vals_make_subset()

Is a set of values a subset of a column of values?

col_vals_increasing() expect_col_vals_increasing() test_col_vals_increasing()

Are column data increasing by row?

col_vals_decreasing() expect_col_vals_decreasing() test_col_vals_decreasing()

Are column data decreasing by row?

col_vals_null() expect_col_vals_null() test_col_vals_null()

Are column data NULL/NA?

col_vals_not_null() expect_col_vals_not_null() test_col_vals_not_null()

Are column data not NULL/NA?

col_vals_regex() expect_col_vals_regex() test_col_vals_regex()

Do strings in column data match a regex pattern?

col_vals_within_spec() expect_col_vals_within_spec() test_col_vals_within_spec()

Do values in column data fit within a specification?

col_vals_expr() expect_col_vals_expr() test_col_vals_expr()

Do column data agree with a predicate expression?

rows_distinct() expect_rows_distinct() test_rows_distinct()

Are row data distinct?

rows_complete() expect_rows_complete() test_rows_complete()

Are row data complete?

col_is_character() expect_col_is_character() test_col_is_character()

Do the columns contain character/string data?

col_is_numeric() expect_col_is_numeric() test_col_is_numeric()

Do the columns contain numeric values?

col_is_integer() expect_col_is_integer() test_col_is_integer()

Do the columns contain integer values?

col_is_logical() expect_col_is_logical() test_col_is_logical()

Do the columns contain logical values?

col_is_date() expect_col_is_date() test_col_is_date()

Do the columns contain R Date objects?

col_is_posix() expect_col_is_posix() test_col_is_posix()

Do the columns contain POSIXct dates?

col_is_factor() expect_col_is_factor() test_col_is_factor()

Do the columns contain R factor objects?

col_exists() expect_col_exists() test_col_exists()

Do one or more columns actually exist?

col_schema_match() expect_col_schema_match() test_col_schema_match()

Do columns in the table (and their types) match a predefined schema?

conjointly() expect_conjointly() test_conjointly()

Perform multiple rowwise validations for joint validity

serially() expect_serially() test_serially()

Run several tests and a final validation in a serial manner

specially() expect_specially() test_specially()

Perform a specialized validation with a user-defined function

Information Functions

We can progressively add information to an informant object by using the collection of info_*() functions. We can add more table-based properties with the info_tabular() function, details about the nature of each column with the info_columns() function, and add sections of our own choosing (and the info that make sense for those sections) with the info_section() function. Snippets of information can be gleaned from the target table by using the info_snippet() function. These bits of information can be incorporated in text defined by the other info_*() functions via { }. Some great snip_*() functions are included to make info-snipping as easy (and useful!) as can be.


Add information that focuses on aspects of the data table as a whole


Add information that focuses on aspects of a data table's columns


Add column information from another data table


Add information that focuses on some key aspect of the data table


Generate a useful text 'snippet' from the target table


A fn for info_snippet(): get a list of column categories


A fn for info_snippet(): get an inline statistical summary


A fn for info_snippet(): get the lowest value from a column


A fn for info_snippet(): get the highest value from a column


Sometimes we want to email a report of a validation because of the importance of the information contained therein. The email_blast() function can be used within the end_fns argument of create_agent(), giving us options to send a customizable message only if specified conditions are met.


Send email at a validation step or at the end of an interrogation


Create an email object from a pointblank agent or informant


Provide simple email message body components: body


Provide simple email message body components: footer


Logging validation failure conditions makes for a good practice during data quality analysis. The log4r_step() function allows for simple generation of log entries and specification of logging destinations.


Enable logging of failure conditions at the validation step level

Agent: Interrogate and Report

If we have an agent object that has a plan (i.e., validation steps), the interrogate() function instructs the agent to interrogate the target table. The agent will go to work and also perform specified side-effect functions at the step level and upon completion (if those are functions are defined). After interrogation, we can get a report through printing, however, we can take advantage of more options by using the get_agent_report() function.


Given an agent that has a validation plan, perform an interrogation


Get a summary report from an agent

Informant: Incorporate and Report

If we have an informant object that has been loaded with information from using the info_*() functions, the incorporate() function works to regenerate snippets and integrate those into the info text. After refreshing the table information, we can get an information report through printing, or, by using get_informant_report() function.


Given an informant object, update and incorporate table snippets


Get a table information report from an informant object


The agent always has a special list called an x-list. Access that by invoking the get_agent_x_list() and you’ll then have a smorgasbord of information about how the validation went down. Table extracts are collected by default for failed rows (up to a limit) and we can access those with get_data_extracts(). Table rows can be sundered into ‘pass’ and ‘fail’ pieces. Access those table fragments with the get_sundered_data() function. Want to know if all validation steps have passed with flying colors? Sometimes that could happen; use the all_passed() function to find out.


Get the agent's x-list


Collect data extracts from a validation step


Sunder the data, splitting it into 'pass' and 'fail' pieces


Did all of the validations fully pass?


Transform a pointblank agent to a testthat test file

Object Ops

We have options for writing an agent or informant to disk with the x_write_disk() function. The on-disk object can be retrieved with the x_read_disk() function. You can export a report as an HTML file with export_report(). A set of functions are also available for managing associations to a data table (either directly or with a table-prep formula), and, for editing an agent’s validation steps.


Write an agent, informant, multiagent, or table scan to disk


Read an agent, informant, multiagent, or table scan from disk


Export an agent, informant, multiagent, or table scan to HTML


Set a data table to an agent or informant


Remove a data table associated with an agent or informant


Set a table-prep formula to an agent or informant


Remove a table-prep formula associated with an agent or informant


Activate one or more of an agent's validation steps


Deactivate one or more of an agent's validation steps


Remove one or more of an agent's validation steps

The Multiagent

The multiagent is a group of agents, each tasked with their own interrogation to perform. As a group, they provide an interesting and informative bit of reporting that tracks the evolution of data quality checks over time. With a multiagent object, we can get a data quality report that handles changes in the target data and matches data validation steps across all agent. The predominant application is retrospective analysis of data quality for a target table.


Create a pointblank multiagent object


Read pointblank agents stored on disk as a multiagent


Get a summary report using multiple agents

pointblank YAML

YAML files can be used in pointblank for two distinct purposes: (1) to define agents and their validation plans, and (2) to define information for tables. The yaml_write() function allows us write agent and/or informant YAML from the namesake objects. We can read them back from disk by using the yaml_read_agent() and yaml_read_informant() functions. As a nice shortcut, we can interrogate a table directly from a pointblank YAML file with yaml_agent_interrogate().


Write pointblank objects to YAML files


Read a pointblank YAML file to create an agent object


Read a pointblank YAML file to create an informant object


Get an agent from pointblank YAML and interrogate()


Display pointblank YAML using an agent or a YAML file


Display validation expressions using pointblank YAML


Get an informant from pointblank YAML and incorporate()


Execute all agent and informant YAML tasks

Table Transformers

Table Transformer functions can radically transform a data table and either provide a wholly different table (like a summary table or table properties table) or do some useful filtering in a single step. This can be useful for preparing the target table for validation, creating a temporary table for a few validation steps, or even as something used outside of the pointblank workflows. As a nice bonus these transformer functions will work equally well with data frames, database tables, and Spark tables.


Table Transformer: obtain a summary stats table for numeric columns


Table Transformer: obtain a summary table for string columns


Table Transformer: get the dimensions of a table


Table Transformer: get a table's column names


Table Transformer: shift the times of a table


Table Transformer: slice a table with a slice point on a time column


Get a parameter value from a summary table

Utility and Helper Functions


Generate a table column schema manually or with a reference table


Determine if one or more columns exist in a table.


Put the current date into a file name


Put the current date-time into a file name


The next generation of stopifnot()-type functions: stop_if_not()


Specify a file for download from GitHub



A small table that is useful for testing


A SQLite version of the small_table dataset


A table containing data pertaining to various specifications


A table with game revenue data