In an agent-based workflow, after interrogation with
interrogate() we can
get the row data that didn't pass row-based validation steps with the
get_data_extracts() function. The amount of data available in a particular
extract depends on both the fraction of test units that didn't pass a
validation step and the level of sampling or explicit collection from that
set of units.
The availability of data extracts for each row-based validation step is
depends on whether
extract_failed is set to
TRUE within the
interrogate() call (it is by default). The amount of fail rows extracted
depends on the collection parameters in
interrogate(), and the default
behavior is to collect up to the first 5000 fail rows.
Row-based validation steps are based on the validation functions of the form
col_vals_*() and also include
those types of validation steps can provide data extracts.
get_data_extracts(agent, i = NULL)
An agent object of class
The validation step number, which is assigned to each validation
step in the order of definition. If
A list of tables if
i is not provided, or, a standalone table if
i is given.
# Create a simple table with a # column of numerical values tbl <- dplyr::tibble(a = c(5, 7, 8, 5)) # Create 2 simple validation steps # that test whether values within # column `a` agent <- create_agent(tbl = tbl) %>% col_vals_between(vars(a), 4, 6) %>% col_vals_lte(vars(a), 7) %>% interrogate( extract_failed = TRUE, get_first_n = 10 ) # Get row sample data for those rows # in `tbl` that did not pass the first # validation step (`col_vals_between`) agent %>% get_data_extracts(i = 1)#> # A tibble: 2 x 1 #> a #> <dbl> #> 1 7 #> 2 8