In an agent-based workflow (i.e., initiating with create_agent()), after interrogation with interrogate(), we can extract the row data that didn't pass row-based validation steps with the get_data_extracts() function. There is one discrete extract per row-based validation step and the amount of data available in a particular extract depends on both the fraction of test units that didn't pass the validation step and the level of sampling or explicit collection from that set of units. These extracts can be collected programmatically through get_data_extracts() but they may also be downloaded as CSV files from the HTML report generated by the agent's print method or through the use of get_agent_report().

The availability of data extracts for each row-based validation step depends on whether extract_failed is set to TRUE within the interrogate() call (it is by default). The amount of fail rows extracted depends on the collection parameters in interrogate(), and the default behavior is to collect up to the first 5000 fail rows.

Row-based validation steps are based on those validation functions of the form col_vals_*() and also include conjointly() and rows_distinct(). Only functions from that combined set of validation functions can yield data extracts.

get_data_extracts(agent, i = NULL)

Arguments

agent

An agent object of class ptblank_agent. It should have had interrogate() called on it, such that the validation steps were carried out and any sample rows from non-passing validations could potentially be available in the object.

i

The validation step number, which is assigned to each validation step by pointblank in the order of definition. If NULL (the default), all data extract tables will be provided in a list object.

Value

A list of tables if i is not provided, or, a standalone table if i is given.

Function ID

8-2

See also

Examples

# Create a series of two validation
# steps focused on testing row values
# for part of the `small_table` object;
# `interrogate()` immediately
agent <-
  create_agent(
    read_fn = ~ small_table %>%
      dplyr::select(a:f),
    label = "`get_data_extracts()`"
  ) %>%
  col_vals_gt(vars(d), value = 1000) %>%
  col_vals_between(
    vars(c),
    left = vars(a), right = vars(d),
    na_pass = TRUE
  ) %>%
  interrogate()

# Using `get_data_extracts()` with its
# defaults returns of a list of tables,
# where each table is named after the
# validation step that has an extract
# available
agent %>% get_data_extracts()
#> $`1`
#> # A tibble: 6 × 6
#>       a b             c     d e     f    
#>   <int> <chr>     <dbl> <dbl> <lgl> <chr>
#> 1     8 3-ldm-038     7  284. TRUE  low  
#> 2     7 1-knw-093     3  843. TRUE  high 
#> 3     3 5-bce-642     9  838. FALSE high 
#> 4     3 5-bce-642     9  838. FALSE high 
#> 5     4 2-dmx-010     7  834. TRUE  low  
#> 6     2 7-dmx-010     8  108. FALSE low  
#> 
#> $`2`
#> # A tibble: 4 × 6
#>       a b             c     d e     f    
#>   <int> <chr>     <dbl> <dbl> <lgl> <chr>
#> 1     6 8-kdg-938     3 2343. TRUE  high 
#> 2     8 3-ldm-038     7  284. TRUE  low  
#> 3     7 1-knw-093     3  843. TRUE  high 
#> 4     4 5-boe-639     2 1036. FALSE low  
#> 

# We can get an extract for a specific
# step by specifying it in the `i`
# argument; let's get the failing rows
# from the first validation step
# (`col_vals_gt`)
agent %>% get_data_extracts(i = 1)
#> # A tibble: 6 × 6
#>       a b             c     d e     f    
#>   <int> <chr>     <dbl> <dbl> <lgl> <chr>
#> 1     8 3-ldm-038     7  284. TRUE  low  
#> 2     7 1-knw-093     3  843. TRUE  high 
#> 3     3 5-bce-642     9  838. FALSE high 
#> 4     3 5-bce-642     9  838. FALSE high 
#> 5     4 2-dmx-010     7  834. TRUE  low  
#> 6     2 7-dmx-010     8  108. FALSE low