create_data_overview <- function(data, title = NULL) {
# Build a summary data frame with one row per column
summary_df <-
dplyr::tibble(
variable = names(data),
type = sapply(data, function(x) class(x)[1]),
n_missing = sapply(data, function(x) sum(is.na(x))),
pct_missing = sapply(data, function(x) mean(is.na(x)) * 100),
n_unique = sapply(data, function(x) length(unique(x)))
)
# Add example values (first non-NA value)
summary_df$example <-
sapply(data, function(x) {
non_na <- x[!is.na(x)]
if (length(non_na) == 0) {
return(NA_character_)
}
val <- non_na[1]
if (is.numeric(val)) {
format(round(val, 3), nsmall = 3)
} else if (inherits(val, "Date")) {
as.character(val)
} else {
as.character(val)
}
}
)
# Create the gt table
tbl <-
summary_df |>
gt() |>
tab_header(
title = if (is.null(title)) "Dataset Overview" else title,
subtitle = paste0(nrow(data), " rows × ", ncol(data), " columns")
) |>
cols_label(
variable = "Variable",
type = "Type",
n_missing = "Missing (n)",
pct_missing = "Missing (%)",
n_unique = "Unique Values",
example = "Example"
) |>
fmt_number(columns = pct_missing, decimals = 1) |>
fmt_integer(columns = c(n_missing, n_unique)) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_body(columns = variable)
) |>
data_color(
columns = pct_missing,
palette = c("white", "orange", "red"),
domain = c(0, 100)
) |>
tab_source_note(
source_note = paste("Generated on", Sys.Date())
) |>
opt_stylize(style = 1) |>
opt_horizontal_padding(scale = 2)
return(tbl)
}14 Creating extensions
The gt ecosystem includes packages that demonstrate what becomes possible when developers build upon its foundation. Packages like gtsummary transform statistical models and data summaries into publication-ready tables. Others like gtExtras provide themes, helper functions, and enhanced visualizations. The pointblank package uses gt to generate comprehensive data validation reports, presenting quality checks and test results in well-formatted tables. Each identified gaps between what analysts needed and what gt alone provided, then filled those gaps with useful extensions.
In this chapter, we’ll explore how you might create your own extensions. Perhaps you work in a domain with specialized reporting requirements. Perhaps your organization has established table styles that should be applied consistently. Perhaps you’ve found yourself copying the same sequences of gt function calls across projects and want to encapsulate that workflow. Whatever the motivation, extending gt through your own package opens possibilities that using gt directly cannot match.
The rewards extend beyond personal convenience. A well-designed extension package creates institutional value. New team members can produce properly formatted tables without learning every gt option. Reports maintain visual consistency across authors and time periods. Domain-specific conventions become encoded in functions rather than documented in style guides that may not be followed. The investment in creating such a package pays dividends far exceeding the initial effort.
We’ll examine three complementary approaches to extension. First, creating functions that generate complete display tables from data summaries, taking raw data or statistical objects and producing finished tables ready for publication. Second, building wrapper functions that modify or enhance tables created elsewhere, adding consistent styling or domain-specific elements. Third, developing more ambitious extensions that push gt into new territory. Throughout, we’ll provide concrete, working examples that illustrate not just what to do but why certain design choices lead to better outcomes.
14.1 Creating display tables to augment data summaries
The most impactful extensions often emerge from recognizing patterns in your own work. You perform a particular analysis, format the results as a table, apply certain styling, and repeat this process dozens or hundreds of times. Each repetition involves the same conceptual steps but implemented anew, with opportunities for inconsistency and error at every turn.
Consider the task of summarizing a dataset’s structure. Data scientists frequently need to document what variables a dataset contains, their types, the presence of missing values, and basic distributional properties. This information helps colleagues understand the data, aids in quality control, and provides essential context for downstream analyses. Yet creating such summaries manually is tedious, and the results vary based on who creates them and when.
14.1.1 A dataset overview function
Let’s build a function that produces a comprehensive dataset overview table. The function should accept any data frame and return a gt table documenting its structure:
This function encapsulates substantial complexity. It calculates summary statistics for each column, formats them appropriately, applies visual styling that highlights potential data quality issues (the color gradient on missing percentages draws attention to problematic columns), and documents when the overview was created. A user need only call create_data_overview(my_data) to receive a finished table.
Let’s see it in action with the towny dataset:
create_data_overview(towny, title = "Towny Dataset Structure")| Towny Dataset Structure | |||||
| 414 rows × 25 columns | |||||
| Variable | Type | Missing (n) | Missing (%) | Unique Values | Example |
|---|---|---|---|---|---|
| name | character | 0 | 0.0 | 413 | Addington Highlands |
| website | character | 4 | 1.0 | 411 | https://addingtonhighlands.ca |
| status | character | 0 | 0.0 | 2 | lower-tier |
| csd_type | character | 0 | 0.0 | 5 | township |
| census_div | character | 0 | 0.0 | 49 | Lennox and Addington |
| latitude | numeric | 0 | 0.0 | 316 | 45.000 |
| longitude | numeric | 0 | 0.0 | 351 | -77.250 |
| land_area_km2 | numeric | 0 | 0.0 | 412 | 1293.990 |
| population_1996 | integer | 3 | 0.7 | 407 | 2429.000 |
| population_2001 | integer | 3 | 0.7 | 408 | 2402.000 |
| population_2006 | integer | 0 | 0.0 | 408 | 2512.000 |
| population_2011 | integer | 0 | 0.0 | 412 | 2517.000 |
| population_2016 | integer | 0 | 0.0 | 408 | 2318.000 |
| population_2021 | integer | 0 | 0.0 | 406 | 2534.000 |
| density_1996 | numeric | 3 | 0.7 | 405 | 1.880 |
| density_2001 | numeric | 3 | 0.7 | 397 | 1.860 |
| density_2006 | numeric | 0 | 0.0 | 402 | 1.940 |
| density_2011 | numeric | 0 | 0.0 | 401 | 1.950 |
| density_2016 | numeric | 0 | 0.0 | 400 | 1.790 |
| density_2021 | numeric | 0 | 0.0 | 398 | 1.960 |
| pop_change_1996_2001_pct | numeric | 4 | 1.0 | 381 | -0.011 |
| pop_change_2001_2006_pct | numeric | 4 | 1.0 | 382 | 0.046 |
| pop_change_2006_2011_pct | numeric | 1 | 0.2 | 383 | 0.002 |
| pop_change_2011_2016_pct | numeric | 1 | 0.2 | 363 | -0.079 |
| pop_change_2016_2021_pct | numeric | 1 | 0.2 | 380 | 0.093 |
| Generated on 2026-01-05 | |||||
The table immediately reveals the dataset’s structure. We see numeric columns, character columns, and their properties. Missing value percentages are color-coded, making it easy to spot columns that might need attention. The example values provide concrete illustrations of what each column contains.
14.1.2 Thinking about function design
Several design decisions in this function merit discussion. The function returns a gt table object rather than printing it directly. This allows users to further modify the result if needed, adding footnotes, changing colors, or applying additional formatting. If the function printed the table and returned invisibly, such modifications would be impossible.
The title parameter has a sensible default but allows customization. This pattern appears throughout well-designed gt extensions: provide defaults that work for most cases while allowing users to override them when circumstances warrant.
The color gradient on missing percentages demonstrates a broader principle: visual encoding should convey meaning. Rather than requiring users to scan a column of numbers, the color immediately signals which variables have concerning levels of missingness. This is not mere decoration but purposeful use of visual channels to communicate information.
The timestamp in the source note serves a documentation purpose. When the table appears in a report weeks or months later, readers know when the summary was generated. If the underlying data changes, outdated overviews can be identified and refreshed.
14.1.3 A correlation matrix function
Let’s develop another example: a function that creates publication-ready correlation tables. Correlation matrices are ubiquitous in statistical reporting, yet the default outputs from R’s cor() function are bare numeric matrices unsuitable for publication. Our function will transform them into properly formatted tables with visual highlighting:
create_correlation_table <- function(
data,
method = "pearson",
title = NULL,
decimals = 2
) {
# Select only numeric columns
numeric_data <- data |> select(where(is.numeric))
if (ncol(numeric_data) < 2) {
stop("Data must contain at least two numeric columns")
}
# Calculate correlations
cor_matrix <- cor(numeric_data, use = "pairwise.complete.obs", method = method)
# Convert to data frame for gt
cor_df <- as.data.frame(cor_matrix)
cor_df <- tibble::rownames_to_column(cor_df, var = "variable")
# Method label for subtitle
method_label <- switch(method,
pearson = "Pearson",
spearman = "Spearman",
kendall = "Kendall"
)
# Build the table
tbl <-
cor_df |>
gt(rowname_col = "variable") |>
tab_header(
title = if (is.null(title)) "Correlation Matrix" else title,
subtitle = paste(method_label, "correlation coefficients")
) |>
fmt_number(columns = everything(), decimals = decimals) |>
data_color(
columns = everything(),
palette = c("#B2182B", "#FDDBC7", "white", "#D1E5F0", "#2166AC"),
domain = c(-1, 1)
) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_stub()
) |>
sub_values(
values = 1,
replacement = ""
) |>
opt_stylize(style = 1) |>
opt_horizontal_padding(scale = 2) |>
cols_width(everything() ~ px(70))
return(tbl)
}The function handles several details that users would otherwise need to address manually. It selects only numeric columns, computes correlations with appropriate handling of missing values, applies a diverging color palette centered on zero (so positive correlations appear blue, negative correlations appear red, and values near zero remain white), and replaces the diagonal values of 1 with empty strings since the correlation of a variable with itself is trivially perfect and not informative.
Testing with the gtcars dataset reveals the function’s output:
gtcars |>
select(mpg_c, mpg_h, hp, hp_rpm, trq, trq_rpm) |>
create_correlation_table(title = "Vehicle Performance Correlations")| Vehicle Performance Correlations | ||||||
| Pearson correlation coefficients | ||||||
| mpg_c | mpg_h | hp | hp_rpm | trq | trq_rpm | |
|---|---|---|---|---|---|---|
| mpg_c | 0.84 | −0.66 | −0.42 | −0.45 | −0.47 | |
| mpg_h | 0.84 | −0.79 | −0.60 | −0.52 | −0.63 | |
| hp | −0.66 | −0.79 | 0.47 | 0.85 | 0.47 | |
| hp_rpm | −0.42 | −0.60 | 0.47 | 0.03 | 0.79 | |
| trq | −0.45 | −0.52 | 0.85 | 0.03 | 0.09 | |
| trq_rpm | −0.47 | −0.63 | 0.47 | 0.79 | 0.09 | |
The color encoding immediately reveals patterns. Strong positive correlations appear in deep blue, strong negative correlations in deep red. A reader scanning this table can instantly identify which variables move together and which move in opposition, without parsing individual numbers. The blank diagonal removes visual clutter, and the consistent formatting presents a polished appearance suitable for publication.
14.1.4 Building a descriptive statistics function
Descriptive statistics tables appear in virtually every research paper and many business reports. Yet producing them typically requires either tedious manual work or wrestling with packages that provide more than you need. A focused function can streamline this common task:
create_descriptive_stats <- function(
data,
variables = NULL,
statistics = c("n", "mean", "sd", "min", "max"),
by = NULL,
decimals = 2,
title = NULL
) {
# Select variables to summarize
if (is.null(variables)) {
numeric_vars <- names(data)[sapply(data, is.numeric)]
} else {
numeric_vars <- variables
}
# Define statistic functions
stat_fns <- list(
n = function(x) sum(!is.na(x)),
mean = function(x) mean(x, na.rm = TRUE),
sd = function(x) sd(x, na.rm = TRUE),
min = function(x) min(x, na.rm = TRUE),
max = function(x) max(x, na.rm = TRUE),
median = function(x) median(x, na.rm = TRUE),
q25 = function(x) quantile(x, 0.25, na.rm = TRUE),
q75 = function(x) quantile(x, 0.75, na.rm = TRUE)
)
# Calculate statistics for each variable
if (is.null(by)) {
# Overall statistics
results <- lapply(numeric_vars, function(var) {
vals <- data[[var]]
stats <- sapply(statistics, function(s) stat_fns[[s]](vals))
c(variable = var, stats)
})
summary_df <- as.data.frame(do.call(rbind, results))
# Convert numeric columns
for (stat in statistics) {
summary_df[[stat]] <- as.numeric(summary_df[[stat]])
}
} else {
# Statistics by group
groups <- unique(data[[by]])
results <- list()
for (var in numeric_vars) {
for (grp in groups) {
vals <- data[[var]][data[[by]] == grp]
stats <- sapply(statistics, function(s) stat_fns[[s]](vals))
results[[length(results) + 1]] <- c(
variable = var,
group = as.character(grp),
stats
)
}
}
summary_df <- as.data.frame(do.call(rbind, results))
for (stat in statistics) {
summary_df[[stat]] <- as.numeric(summary_df[[stat]])
}
}
# Statistic labels
stat_labels <-
c(
n = "N",
mean = "Mean",
sd = "SD",
min = "Min",
max = "Max",
median = "Median",
q25 = "Q1",
q75 = "Q3"
)
# Build the gt table
if (is.null(by)) {
tbl <-
summary_df |>
gt(rowname_col = "variable") |>
tab_stubhead(label = "Variable") |>
fmt_number(columns = where(is.numeric), decimals = decimals) |>
fmt_integer(columns = any_of("n")) |>
cols_label(.list = setNames(
as.list(stat_labels[statistics]),
statistics
))
} else {
tbl <-
summary_df |>
gt(rowname_col = "variable", groupname_col = "group") |>
tab_stubhead(label = "Variable") |>
fmt_number(columns = where(is.numeric), decimals = decimals) |>
fmt_integer(columns = any_of("n")) |>
cols_label(.list = setNames(
as.list(stat_labels[statistics]),
statistics
))
}
tbl <-
tbl |>
tab_header(title = if (is.null(title)) "Descriptive Statistics" else title) |>
opt_stylize(style = 1) |>
opt_horizontal_padding(scale = 2)
return(tbl)
}This function provides flexibility in what statistics to compute, allowing users to select from a menu of common options. The by parameter enables grouped analyses, producing side-by-side comparisons across categories. Let’s see both use cases:
| Vehicle Performance Metrics | ||||||
| Variable | N | Mean | SD | Min | Median | Max |
|---|---|---|---|---|---|---|
| mpg_c | 46 | 15.33 | 3.43 | 11.00 | 15.00 | 28.00 |
| mpg_h | 46 | 22.20 | 3.87 | 16.00 | 22.00 | 30.00 |
| hp | 47 | 514.96 | 139.82 | 259.00 | 552.00 | 949.00 |
| trq | 47 | 441.02 | 101.46 | 243.00 | 436.00 | 664.00 |
And now with grouping by vehicle drivetrain:
| Performance by Drivetrain Type | |||
| Variable | N | Mean | SD |
|---|---|---|---|
| rwd | |||
| mpg_c | 34 | 15.15 | 2.79 |
| hp | 34 | 515.50 | 146.30 |
| awd | |||
| mpg_c | 12 | 15.83 | 4.93 |
| hp | 13 | 513.54 | 126.79 |
The grouped version organizes results by drivetrain type, making comparisons across vehicle configurations straightforward. Someone reading this table could immediately see how city fuel economy and horsepower differ between all-wheel drive, rear-wheel drive, and other configurations.
14.2 Providing wrapper functions to modify the table outputs
Not every extension needs to create tables from scratch. Sometimes the greater need is to modify existing tables in consistent ways. A wrapper function takes a gt table as input, applies transformations, and returns the modified table. This approach is great when you want to enforce organizational styling, add standard elements like logos or disclaimers, or provide convenient shortcuts for common formatting patterns.
14.2.1 A theming function for organizational branding
Organizations often have visual identity guidelines specifying colors, fonts, and other design elements. Creating a theming function ensures that all tables produced across the organization share a consistent appearance:
apply_corporate_theme <- function(
gt_tbl,
primary_color = "#1E3A5F",
accent_color = "#E85D04",
header_font = "Georgia",
body_font = "Arial"
) {
gt_tbl |>
tab_options(
# Header styling
heading.background.color = primary_color,
heading.title.font.size = px(18),
heading.subtitle.font.size = px(14),
# Column labels
column_labels.background.color = primary_color,
column_labels.font.weight = "bold",
# Table body
table.font.size = px(13),
# Row striping
row.striping.background_color = "#F5F5F5",
row.striping.include_stub = TRUE,
row.striping.include_table_body = TRUE,
# Borders
table_body.hlines.color = "#E0E0E0",
table_body.vlines.color = "transparent",
# Footer
footnotes.font.size = px(11),
source_notes.font.size = px(11)
) |>
tab_style(
style = cell_text(
color = "white",
font = header_font
),
locations = list(
cells_title(),
cells_column_labels()
)
) |>
tab_style(
style = cell_text(font = body_font),
locations = cells_body()
) |>
tab_style(
style = cell_borders(
sides = "bottom",
color = accent_color,
weight = px(3)
),
locations = cells_column_labels()
)
}This theme function transforms any gt table to match corporate standards. The deep blue primary color establishes professionalism, the orange accent provides visual interest, and the specified fonts ensure consistency. Let’s apply it to a simple table:
gtcars |>
select(mfr, model, year, hp, mpg_c) |>
slice_head(n = 8) |>
gt() |>
tab_header(
title = "Vehicle Performance Summary",
subtitle = "Selected models from our database"
) |>
fmt_integer(columns = c(year, hp)) |>
fmt_number(columns = mpg_c, decimals = 1) |>
cols_label(
mfr = "Manufacturer",
model = "Model",
year = "Year",
hp = "Horsepower",
mpg_c = "City MPG"
) |>
apply_corporate_theme()| Vehicle Performance Summary | ||||
| Selected models from our database | ||||
| Manufacturer | Model | Year | Horsepower | City MPG |
|---|---|---|---|---|
| Ford | GT | 2,017 | 647 | 11.0 |
| Ferrari | 458 Speciale | 2,015 | 597 | 13.0 |
| Ferrari | 458 Spider | 2,015 | 562 | 13.0 |
| Ferrari | 458 Italia | 2,014 | 562 | 13.0 |
| Ferrari | 488 GTB | 2,016 | 661 | 15.0 |
| Ferrari | California | 2,015 | 553 | 16.0 |
| Ferrari | GTC4Lusso | 2,017 | 680 | 12.0 |
| Ferrari | FF | 2,015 | 652 | 11.0 |
Any table passed through apply_corporate_theme() acquires the organizational look. The function demonstrates how wrapper functions can encapsulate substantial complexity while providing a simple interface. Users need not understand the dozens of tab_options() parameters, they simply apply the theme.
14.2.2 Adding standard elements
Some contexts require standard elements on all tables: disclaimers, data sources, or organizational logos. A wrapper function can add these consistently:
add_report_footer <- function(
gt_tbl,
data_source = NULL,
disclaimer = NULL,
include_date = TRUE
) {
# Add data source if provided
if (!is.null(data_source)) {
gt_tbl <-
gt_tbl |>
tab_source_note(
source_note = paste("Data Source:", data_source)
)
}
# Add disclaimer if provided
if (!is.null(disclaimer)) {
gt_tbl <-
gt_tbl |>
tab_source_note(
source_note = md(paste0("*", disclaimer, "*"))
)
}
# Add generation date
if (include_date) {
gt_tbl <-
gt_tbl |>
tab_source_note(
source_note = paste("Report generated:", format(Sys.Date(), "%B %d, %Y"))
)
}
return(gt_tbl)
}This function adds a customizable footer section to any table. Used consistently, it ensures that all tables in a report carry appropriate attribution and disclaimers:
towny |>
select(name, land_area_km2, population_2021, density_2021) |>
slice_max(population_2021, n = 5) |>
gt() |>
tab_header(title = "Ontario's Largest Municipalities") |>
fmt_integer(columns = c(population_2021, density_2021)) |>
fmt_number(columns = land_area_km2, decimals = 1) |>
cols_label(
name = "Municipality",
land_area_km2 = "Area (km²)",
population_2021 = "Population",
density_2021 = "Density"
) |>
add_report_footer(
data_source = "Statistics Canada, 2021 Census",
disclaimer = "Figures subject to revision"
)| Ontario's Largest Municipalities | |||
| Municipality | Area (km²) | Population | Density |
|---|---|---|---|
| Toronto | 631.1 | 2,794,356 | 4,428 |
| Ottawa | 2,788.2 | 1,017,449 | 365 |
| Mississauga | 292.7 | 717,961 | 2,453 |
| Brampton | 265.9 | 656,480 | 2,469 |
| Hamilton | 1,118.3 | 569,353 | 509 |
| Data Source: Statistics Canada, 2021 Census | |||
| Figures subject to revision | |||
| Report generated: January 05, 2026 | |||
14.2.3 A significance highlighting function
In statistical reporting, highlighting significant results is common practice. Rather than manually applying conditional formatting each time, a wrapper function can standardize this process:
highlight_significant <- function(
gt_tbl,
columns,
threshold = 0.05,
highlight_color = "#E8F5E9",
bold = TRUE
) {
# Apply background color to significant cells
gt_tbl <-
gt_tbl |>
tab_style(
style = cell_fill(color = highlight_color),
locations = cells_body(
columns = {{ columns }},
rows = .data[[rlang::as_name(rlang::enquo(columns))]] < threshold
)
)
# Optionally bold the significant values
if (bold) {
gt_tbl <-
gt_tbl |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_body(
columns = {{ columns }},
rows = .data[[rlang::as_name(rlang::enquo(columns))]] < threshold
)
)
}
return(gt_tbl)
}This function takes a p-value column and highlights cells below the significance threshold. The visual emphasis draws attention to statistically significant findings without requiring readers to scan through numbers.
14.2.4 Building flexible style appliers
Sometimes you want to provide several pre-built styles that users can select. A style applier function with multiple options gives users flexibility while maintaining consistency:
apply_table_style <- function(
gt_tbl,
style = c("minimal", "striped", "bordered", "scientific")
) {
style <- match.arg(style)
if (style == "minimal") {
gt_tbl <-
gt_tbl |>
tab_options(
table_body.hlines.color = "transparent",
table_body.vlines.color = "transparent",
column_labels.border.bottom.color = "black",
column_labels.border.bottom.width = px(2),
table_body.border.bottom.color = "black",
table_body.border.bottom.width = px(2)
)
} else if (style == "striped") {
gt_tbl <-
gt_tbl |>
opt_row_striping() |>
tab_options(
row.striping.background_color = "#F8F9FA",
table_body.hlines.color = "transparent"
)
} else if (style == "bordered") {
gt_tbl <-
gt_tbl |>
tab_options(
table_body.hlines.color = "#DEE2E6",
table_body.vlines.color = "#DEE2E6",
column_labels.border.bottom.color = "#343A40",
column_labels.border.bottom.width = px(2)
) |>
tab_style(
style = cell_borders(
sides = c("left", "right"),
color = "#DEE2E6"
),
locations = cells_body()
)
} else if (style == "scientific") {
gt_tbl <-
gt_tbl |>
tab_options(
table.font.size = px(11),
heading.title.font.size = px(13),
heading.subtitle.font.size = px(11),
table_body.hlines.color = "transparent",
column_labels.border.bottom.color = "black",
column_labels.border.top.color = "black",
table_body.border.bottom.color = "black"
) |>
tab_style(
style = cell_text(size = px(10)),
locations = cells_source_notes()
)
}
return(gt_tbl)
}Users can select from predefined styles while the function handles all the underlying options:
base_table <-
exibble |>
select(char, num, currency) |>
slice(1:5) |>
gt() |>
tab_header(title = "Style Comparison", subtitle = "Scientific style") |>
fmt_number(columns = num, decimals = 2) |>
fmt_currency(columns = currency)
base_table |>
apply_table_style(style = "scientific")| Style Comparison | ||
| Scientific style | ||
| char | num | currency |
|---|---|---|
| apricot | 0.11 | $49.95 |
| banana | 2.22 | $17.95 |
| coconut | 33.33 | $1.39 |
| durian | 444.40 | $65,100.00 |
| NA | 5,550.00 | $1,325.81 |
14.3 Implementation ideas
The examples thus far demonstrate foundational patterns. This section explores more ambitious possibilities: extensions that push into specialized domains or provide capabilities not easily achieved with basic gt usage.
14.3.1 Comparison table generator
Many reports require side-by-side comparisons with calculated differences. A specialized function can automate this pattern:
create_comparison_table <- function(
data,
group_col,
value_cols,
group_labels = NULL,
show_difference = TRUE,
show_pct_change = TRUE,
decimals = 1
) {
groups <- unique(data[[group_col]])
if (length(groups) != 2) {
stop("Comparison requires exactly two groups")
}
# Split data by group
group1_data <- data[data[[group_col]] == groups[1], ]
group2_data <- data[data[[group_col]] == groups[2], ]
# Create comparison data frame
comparison_df <- dplyr::tibble(metric = value_cols)
# Get values for each group (assuming single row per group or aggregating)
comparison_df[[as.character(groups[1])]] <- sapply(value_cols, function(v) {
mean(group1_data[[v]], na.rm = TRUE)
})
comparison_df[[as.character(groups[2])]] <- sapply(value_cols, function(v) {
mean(group2_data[[v]], na.rm = TRUE)
})
# Calculate differences
if (show_difference) {
comparison_df$difference <-
comparison_df[[as.character(groups[2])]] -
comparison_df[[as.character(groups[1])]]
}
if (show_pct_change) {
comparison_df$pct_change <-
(comparison_df[[as.character(groups[2])]] -
comparison_df[[as.character(groups[1])]]) /
comparison_df[[as.character(groups[1])]] * 100
}
# Build table
tbl <-
comparison_df |>
gt(rowname_col = "metric") |>
fmt_number(
columns = c(as.character(groups[1]), as.character(groups[2])),
decimals = decimals
)
if (show_difference) {
tbl <-
tbl |>
fmt_number(
columns = difference,
decimals = decimals,
force_sign = TRUE
) |>
tab_style(
style = cell_text(color = "green"),
locations = cells_body(columns = difference, rows = difference > 0)
) |>
tab_style(
style = cell_text(color = "red"),
locations = cells_body(columns = difference, rows = difference < 0)
)
}
if (show_pct_change) {
tbl <-
tbl |>
fmt_number(
columns = pct_change,
decimals = 1,
force_sign = TRUE,
pattern = "{x}%"
) |>
tab_style(
style = cell_text(color = "green"),
locations = cells_body(columns = pct_change, rows = pct_change > 0)
) |>
tab_style(
style = cell_text(color = "red"),
locations = cells_body(columns = pct_change, rows = pct_change < 0)
)
}
# Apply labels if provided
if (!is.null(group_labels) && length(group_labels) == 2) {
tbl <-
tbl |>
cols_label(
!!as.character(groups[1]) := group_labels[1],
!!as.character(groups[2]) := group_labels[2]
)
}
if (show_difference) {
tbl <- tbl |> cols_label(difference = "Diff")
}
if (show_pct_change) {
tbl <- tbl |> cols_label(pct_change = "% Change")
}
tbl <-
tbl |>
tab_header(title = "Comparison Analysis") |>
tab_stubhead(label = "Metric") |>
opt_stylize(style = 1)
return(tbl)
}The function handles the tedious work of pivoting data, calculating differences, and applying conditional formatting. The color coding for positive and negative changes provides immediate visual feedback:
# Create sample comparison data
comparison_data <-
dplyr::tibble(
period = c("Q1", "Q1", "Q2", "Q2"),
revenue = c(125000, 142000, 125000, 142000),
expenses = c(98000, 105000, 98000, 105000),
customers = c(1250, 1340, 1250, 1340)
) |>
dplyr::filter(row_number() <= 2 | row_number() > 2) |>
dplyr::distinct()
# Simpler approach: create pre-aggregated data
quarterly_metrics <-
dplyr::tibble(
quarter = c("Q1 2024", "Q2 2024"),
revenue = c(125000, 142000),
expenses = c(98000, 105000),
customers = c(1250, 1420)
)
quarterly_metrics |>
create_comparison_table(
group_col = "quarter",
value_cols = c("revenue", "expenses", "customers"),
group_labels = c("Q1 2024", "Q2 2024")
)| Comparison Analysis | ||||
| Metric | Q1 2024 | Q2 2024 | Diff | % Change |
|---|---|---|---|---|
| revenue | 125,000.0 | 142,000.0 | +17,000.0 | +13.6% |
| expenses | 98,000.0 | 105,000.0 | +7,000.0 | +7.1% |
| customers | 1,250.0 | 1,420.0 | +170.0 | +13.6% |
14.3.2 A grading or scoring table function
Educational and assessment contexts often require tables that map numeric scores to letter grades or performance categories. A specialized function can standardize this presentation:
create_grade_table <- function(
data,
name_col,
score_col,
max_score = 100,
grade_breaks = c(90, 80, 70, 60),
grade_labels = c("A", "B", "C", "D", "F"),
show_percentage = TRUE,
title = "Grade Report"
) {
# Calculate percentages and grades
result_df <-
data |>
mutate(
percentage = .data[[score_col]] / max_score * 100,
grade = cut(
percentage,
breaks = c(Inf, grade_breaks, -Inf),
labels = grade_labels,
right = FALSE
)
) |>
select(all_of(c(name_col, score_col)), percentage, grade) |>
arrange(desc(percentage))
# Grade colors
grade_colors <- c(
"A" = "#4CAF50",
"B" = "#8BC34A",
"C" = "#FFC107",
"D" = "#FF9800",
"F" = "#F44336"
)
# Build table
tbl <-
result_df |>
gt(rowname_col = name_col) |>
tab_header(
title = title,
subtitle = paste("Maximum possible score:", max_score)
) |>
fmt_integer(columns = all_of(score_col)) |>
fmt_number(columns = percentage, decimals = 1, pattern = "{x}%") |>
tab_stubhead(label = "Student") |>
cols_label(
!!score_col := "Score",
percentage = "Percentage",
grade = "Grade"
)
# Apply grade colors
for (g in names(grade_colors)) {
tbl <-
tbl |>
tab_style(
style = list(
cell_fill(color = grade_colors[[g]]),
cell_text(weight = "bold", color = "white")
),
locations = cells_body(columns = grade, rows = grade == g)
)
}
# Add summary
avg_score <- mean(result_df[[score_col]], na.rm = TRUE)
avg_pct <- mean(result_df$percentage, na.rm = TRUE)
tbl <-
tbl |>
tab_source_note(
source_note = paste0(
"Class Average: ", round(avg_score, 1),
" (", round(avg_pct, 1), "%)"
)
) |>
opt_stylize(style = 1)
if (!show_percentage) {
tbl <- tbl |> cols_hide(columns = percentage)
}
return(tbl)
}The function calculates grades based on customizable breakpoints, applies color coding to make grade levels immediately visible, and provides class summary statistics:
# Sample student scores
student_scores <-
dplyr::tibble(
student = c(
"Alice", "Billy", "Courtney", "Dirk",
"Eva", "Frank", "Grace", "Henry"
),
exam_score = c(95, 87, 78, 92, 65, 73, 88, 56)
)
student_scores |>
create_grade_table(
name_col = "student",
score_col = "exam_score",
title = "Final Examination Results"
)| Final Examination Results | |||
| Maximum possible score: 100 | |||
| Student | Score | Percentage | Grade |
|---|---|---|---|
| Alice | 95 | 95.0% | F |
| Dirk | 92 | 92.0% | F |
| Grace | 88 | 88.0% | D |
| Billy | 87 | 87.0% | D |
| Courtney | 78 | 78.0% | C |
| Frank | 73 | 73.0% | C |
| Eva | 65 | 65.0% | B |
| Henry | 56 | 56.0% | A |
| Class Average: 79.2 (79.2%) | |||
14.3.3 Data quality report function
Data quality assessment is crucial before any analysis. A dedicated function can automate the production of quality reports:
create_data_quality_report <- function(data, title = "Data Quality Report") {
# Calculate quality metrics for each column
quality_df <-
dplyr::tibble(
column = names(data),
data_type = sapply(data, function(x) class(x)[1]),
total_rows = nrow(data),
non_missing = sapply(data, function(x) sum(!is.na(x))),
missing = sapply(data, function(x) sum(is.na(x))),
missing_pct = sapply(data, function(x) mean(is.na(x)) * 100),
unique_values = sapply(data, function(x) length(unique(x[!is.na(x)]))),
completeness = sapply(data, function(x) (1 - mean(is.na(x))) * 100)
)
# Calculate quality score (simple weighted average)
quality_df <-
quality_df |>
dplyr::mutate(
quality_score = case_when(
completeness >= 99 ~ "Excellent",
completeness >= 95 ~ "Good",
completeness >= 90 ~ "Fair",
completeness >= 80 ~ "Poor",
TRUE ~ "Critical"
)
)
# Score colors
score_colors <- c(
"Excellent" = "#4CAF50",
"Good" = "#8BC34A",
"Fair" = "#FFC107",
"Poor" = "#FF9800",
"Critical" = "#F44336"
)
# Build table
tbl <-
quality_df |>
gt(rowname_col = "column") |>
tab_header(
title = title,
subtitle = paste(nrow(data), "rows analyzed")
) |>
tab_stubhead(label = "Column") |>
cols_hide(columns = c(total_rows, non_missing)) |>
fmt_integer(columns = c(missing, unique_values)) |>
fmt_number(columns = c(missing_pct, completeness), decimals = 1) |>
cols_label(
data_type = "Type",
missing = "Missing",
missing_pct = "Missing %",
unique_values = "Unique",
completeness = "Complete %",
quality_score = "Quality"
) |>
data_color(
columns = completeness,
palette = c("#F44336", "#FF9800", "#FFC107", "#8BC34A", "#4CAF50"),
domain = c(0, 100)
)
# Apply quality score colors
for (score in names(score_colors)) {
tbl <-
tbl |>
tab_style(
style = list(
cell_fill(color = score_colors[[score]]),
cell_text(weight = "bold")
),
locations = cells_body(
columns = quality_score,
rows = quality_score == score
)
)
}
# Overall summary
overall_completeness <- mean(quality_df$completeness)
tbl <-
tbl |>
tab_source_note(
source_note = paste0(
"Overall Data Completeness: ", round(overall_completeness, 1), "%"
)
) |>
opt_stylize(style = 1) |>
opt_horizontal_padding(scale = 2)
return(tbl)
}The report provides a comprehensive view of data quality, with visual indicators making problematic columns immediately apparent:
# Create sample data with varying quality
quality_test <-
dplyr::tibble(
id = 1:100,
name = sample(
c("Alice", "Bob", "Carol", NA), 100,
replace = TRUE,
prob = c(0.3, 0.3, 0.3, 0.1)
),
age = sample(c(25:65, NA), 100, replace = TRUE),
salary = c(rep(NA, 25), sample(50000:150000, 75, replace = TRUE)),
department = sample(c("Sales", "Engineering", "Marketing"), 100, replace = TRUE)
)
quality_test |>
create_data_quality_report(title = "Employee Data Quality Assessment")| Employee Data Quality Assessment | ||||||
| 100 rows analyzed | ||||||
| Column | Type | Missing | Missing % | Unique | Complete % | Quality |
|---|---|---|---|---|---|---|
| id | integer | 0 | 0.0 | 100 | 100.0 | Excellent |
| name | character | 19 | 19.0 | 3 | 81.0 | Poor |
| age | integer | 2 | 2.0 | 33 | 98.0 | Good |
| salary | integer | 25 | 25.0 | 75 | 75.0 | Critical |
| department | character | 0 | 0.0 | 3 | 100.0 | Excellent |
| Overall Data Completeness: 90.8% | ||||||
14.4 Packaging your extensions
The examples in this chapter are presented as standalone functions, but their true power emerges when packaged for reuse. An R package provides structure for documentation, testing, and distribution. It ensures that your extensions are available wherever you work and can be shared with colleagues.
Creating a package from extension functions follows standard R package development practices. The mechanics of package creation (directory structure, documentation with roxygen2, testing, dependency management) are thoroughly covered in R Packages by Hadley Wickham and Jennifer Bryan. That book provides comprehensive guidance on everything from initial setup to publication on CRAN.
For gt extensions specifically, a few considerations warrant attention. Document your functions with clear parameter descriptions and meaningful examples. Since your functions produce visual outputs, consider including example tables in vignettes where readers can see exactly what the functions produce. Think about scope: a package focused on a specific domain (financial reporting, academic publishing, healthcare analytics) will be more useful than a grab bag of unrelated utilities. Even a small collection of utilities that serve your own needs justifies the packaging effort; the discipline of creating a package improves the code itself.
14.5 Beyond basic extensions
The patterns explored in this chapter represent starting points rather than boundaries. More sophisticated extensions might integrate with external data sources, pulling data from databases or APIs and presenting it in formatted tables. They might generate multiple related tables from a single function call, producing suites of outputs for comprehensive reporting. They might provide interactive features for HTML outputs, adding user controls for filtering or sorting.
Some extensions might focus on specific output formats. A package designed for PDF reports might include functions optimized for print layout. A package for dashboards might emphasize compact, information-dense designs. A package for presentations might provide larger text sizes and simplified structures appropriate for projection.
Other extensions might introduce entirely new table components. While gt provides a rich vocabulary of table elements, specific domains may have conventions not directly supported. A package could define new structural elements and rendering logic for those conventions.
The key is identifying needs that arise repeatedly in your work and addressing them systematically. Each time you find yourself copying and modifying code from a previous project, that’s a signal that a function might be warranted. Each time you explain to a colleague how to format a certain type of table, that’s a signal that your explanation could be encoded in software.
Creating extensions for gt is an exercise in understanding both the package and your own requirements. It demands clarity about what you want to achieve and how gt can help achieve it. The resulting functions, when well designed, become multipliers of your effectiveness. They transform tasks that once required careful attention into operations that happen correctly by default.
Many successful gt extension packages began exactly this way. Someone recognized a gap, built functions to address it, and shared the result. Your extensions might start as personal utilities and grow into resources that benefit a wider community. Even if they remain private to your organization, they contribute to better, more consistent, more maintainable reporting. That contribution is the purpose of extending gt: not extension for its own sake but extension in service of clearer communication through better tables.
14.6 Summary
This chapter has explored how to extend gt by creating your own functions and packages. Whether addressing domain-specific needs or encoding organizational standards, extensions multiply the value of your table-making expertise.
The key approaches we’ve covered:
- display table functions take data and produce complete, formatted tables. They encapsulate multi-step workflows (data transformation, table creation, formatting, styling) into single function calls with sensible defaults and useful customization options.
- modifier functions accept existing gt tables and enhance them. They add consistent styling, domain-specific elements, or standard configurations without requiring users to remember every option.
- data quality and diagnostic tools use gt’s presentation capabilities to communicate about data itself: completeness reports, validation summaries, and structural overviews.
- packaging considerations: include clear documentation, meaningful examples, focused scope, and the discipline that formal packaging brings to code quality.
The patterns demonstrated here (dataset overviews, correlation matrices, data quality reports, modifier functions) represent starting points. Your extensions might address entirely different needs: financial reporting conventions, scientific publication requirements, corporate branding standards, or analytical workflows unique to your domain.
The underlying principle remains constant: identify repetitive table-making tasks, understand what varies and what stays constant, then encode the constants in functions while exposing the variations as parameters. Each extension you create reduces future effort while improving consistency.
This book has journeyed from gt’s foundational concepts through formatting, styling, and advanced features to the creation of extensions. The destination isn’t mastery of a package but rather the ability to communicate data effectively through well-crafted tables. gt provides the tools whereas your understanding of your data and your audience provides the direction. Together, they enable tables that inform, clarify, and persuade: tables worthy of the information they present.