14 Creating extensions

The gt ecosystem includes packages that demonstrate what becomes possible when developers build upon its foundation. Packages like gtsummary transform statistical models and data summaries into publication-ready tables. Others like gtExtras provide themes, helper functions, and enhanced visualizations. The pointblank package uses gt to generate comprehensive data validation reports, presenting quality checks and test results in well-formatted tables. Each identified gaps between what analysts needed and what gt alone provided, then filled those gaps with useful extensions.

In this chapter, we’ll explore how you might create your own extensions. Perhaps you work in a domain with specialized reporting requirements. Perhaps your organization has established table styles that should be applied consistently. Perhaps you’ve found yourself copying the same sequences of gt function calls across projects and want to encapsulate that workflow. Whatever the motivation, extending gt through your own package opens possibilities that using gt directly cannot match.

The rewards extend beyond personal convenience. A well-designed extension package creates institutional value. New team members can produce properly formatted tables without learning every gt option. Reports maintain visual consistency across authors and time periods. Domain-specific conventions become encoded in functions rather than documented in style guides that may not be followed. The investment in creating such a package pays dividends far exceeding the initial effort.

We’ll examine three complementary approaches to extension. First, creating functions that generate complete display tables from data summaries, taking raw data or statistical objects and producing finished tables ready for publication. Second, building wrapper functions that modify or enhance tables created elsewhere, adding consistent styling or domain-specific elements. Third, developing more ambitious extensions that push gt into new territory. Throughout, we’ll provide concrete, working examples that illustrate not just what to do but why certain design choices lead to better outcomes.

14.1 Creating display tables to augment data summaries

The most impactful extensions often emerge from recognizing patterns in your own work. You perform a particular analysis, format the results as a table, apply certain styling, and repeat this process dozens or hundreds of times. Each repetition involves the same conceptual steps but implemented anew, with opportunities for inconsistency and error at every turn.

Consider the task of summarizing a dataset’s structure. Data scientists frequently need to document what variables a dataset contains, their types, the presence of missing values, and basic distributional properties. This information helps colleagues understand the data, aids in quality control, and provides essential context for downstream analyses. Yet creating such summaries manually is tedious, and the results vary based on who creates them and when.

14.1.1 A dataset overview function

Let’s build a function that produces a comprehensive dataset overview table. The function should accept any data frame and return a gt table documenting its structure:

create_data_overview <- function(data, title = NULL) {
  
  # Build a summary data frame with one row per column
  summary_df <-
    dplyr::tibble(
      variable = names(data),
      type = sapply(data, function(x) class(x)[1]),
      n_missing = sapply(data, function(x) sum(is.na(x))),
      pct_missing = sapply(data, function(x) mean(is.na(x)) * 100),
      n_unique = sapply(data, function(x) length(unique(x)))
    )
  
  # Add example values (first non-NA value)
  summary_df$example <- 
    sapply(data, function(x) {
      non_na <- x[!is.na(x)]
      if (length(non_na) == 0) {
        return(NA_character_)
      }
      val <- non_na[1]
      if (is.numeric(val)) {
        format(round(val, 3), nsmall = 3)
      } else if (inherits(val, "Date")) {
        as.character(val)
      } else {
        as.character(val)
      }
    }
  )
  
  # Create the gt table
  tbl <- 
    summary_df |>
    gt() |>
    tab_header(
      title = if (is.null(title)) "Dataset Overview" else title,
      subtitle = paste0(nrow(data), " rows × ", ncol(data), " columns")
    ) |>
    cols_label(
      variable = "Variable",
      type = "Type",
      n_missing = "Missing (n)",
      pct_missing = "Missing (%)",
      n_unique = "Unique Values",
      example = "Example"
    ) |>
    fmt_number(columns = pct_missing, decimals = 1) |>
    fmt_integer(columns = c(n_missing, n_unique)) |>
    tab_style(
      style = cell_text(weight = "bold"),
      locations = cells_body(columns = variable)
    ) |>
    data_color(
      columns = pct_missing,
      palette = c("white", "orange", "red"),
      domain = c(0, 100)
    ) |>
    tab_source_note(
      source_note = paste("Generated on", Sys.Date())
    ) |>
    opt_stylize(style = 1) |>
    opt_horizontal_padding(scale = 2)
  
  return(tbl)
}

This function encapsulates substantial complexity. It calculates summary statistics for each column, formats them appropriately, applies visual styling that highlights potential data quality issues (the color gradient on missing percentages draws attention to problematic columns), and documents when the overview was created. A user need only call create_data_overview(my_data) to receive a finished table.

Let’s see it in action with the towny dataset:

create_data_overview(towny, title = "Towny Dataset Structure")

Variable	Type	Missing (n)	Missing (%)	Unique Values	Example
Towny Dataset Structure
414 rows × 25 columns
name	character	0	0.0	413	Addington Highlands
website	character	4	1.0	411	https://addingtonhighlands.ca
status	character	0	0.0	2	lower-tier
csd_type	character	0	0.0	5	township
census_div	character	0	0.0	49	Lennox and Addington
latitude	numeric	0	0.0	316	45.000
longitude	numeric	0	0.0	351	-77.250
land_area_km2	numeric	0	0.0	412	1293.990
population_1996	integer	3	0.7	407	2429.000
population_2001	integer	3	0.7	408	2402.000
population_2006	integer	0	0.0	408	2512.000
population_2011	integer	0	0.0	412	2517.000
population_2016	integer	0	0.0	408	2318.000
population_2021	integer	0	0.0	406	2534.000
density_1996	numeric	3	0.7	405	1.880
density_2001	numeric	3	0.7	397	1.860
density_2006	numeric	0	0.0	402	1.940
density_2011	numeric	0	0.0	401	1.950
density_2016	numeric	0	0.0	400	1.790
density_2021	numeric	0	0.0	398	1.960
pop_change_1996_2001_pct	numeric	4	1.0	381	-0.011
pop_change_2001_2006_pct	numeric	4	1.0	382	0.046
pop_change_2006_2011_pct	numeric	1	0.2	383	0.002
pop_change_2011_2016_pct	numeric	1	0.2	363	-0.079
pop_change_2016_2021_pct	numeric	1	0.2	380	0.093
Generated on 2026-02-12

The table immediately reveals the dataset’s structure. We see numeric columns, character columns, and their properties. Missing value percentages are color-coded, making it easy to spot columns that might need attention. The example values provide concrete illustrations of what each column contains.

14.1.2 Thinking about function design

Several design decisions in this function merit discussion. The function returns a gt table object rather than printing it directly. This allows users to further modify the result if needed, adding footnotes, changing colors, or applying additional formatting. If the function printed the table and returned invisibly, such modifications would be impossible.

The title parameter has a sensible default but allows customization. This pattern appears throughout well-designed gt extensions: provide defaults that work for most cases while allowing users to override them when circumstances warrant.

The color gradient on missing percentages demonstrates a broader principle: visual encoding should convey meaning. Rather than requiring users to scan a column of numbers, the color immediately signals which variables have concerning levels of missingness. This is not mere decoration but purposeful use of visual channels to communicate information.

The timestamp in the source note serves a documentation purpose. When the table appears in a report weeks or months later, readers know when the summary was generated. If the underlying data changes, outdated overviews can be identified and refreshed.

14.1.3 A correlation matrix function

Let’s develop another example: a function that creates publication-ready correlation tables. Correlation matrices are ubiquitous in statistical reporting, yet the default outputs from R’s cor() function are bare numeric matrices unsuitable for publication. Our function will transform them into properly formatted tables with visual highlighting:

create_correlation_table <- function(
  data, 
  method = "pearson",
  title = NULL,
  decimals = 2
) {

  # Select only numeric columns
  numeric_data <- data |> select(where(is.numeric))

  if (ncol(numeric_data) < 2) {
    stop("Data must contain at least two numeric columns")
  }

  # Calculate correlations
  cor_matrix <- cor(numeric_data, use = "pairwise.complete.obs", method = method)

  # Convert to data frame for gt
  cor_df <- as.data.frame(cor_matrix)
  cor_df <- tibble::rownames_to_column(cor_df, var = "variable")

  # Method label for subtitle

  method_label <- switch(method,
    pearson = "Pearson",
    spearman = "Spearman",
    kendall = "Kendall"
  )

  # Build the table
  tbl <-
    cor_df |>
    gt(rowname_col = "variable") |>
    tab_header(
      title = if (is.null(title)) "Correlation Matrix" else title,
      subtitle = paste(method_label, "correlation coefficients")
    ) |>
    fmt_number(columns = everything(), decimals = decimals) |>
    data_color(
      columns = everything(),
      palette = c("#B2182B", "#FDDBC7", "white", "#D1E5F0", "#2166AC"),
      domain = c(-1, 1)
    ) |>
    tab_style(
      style = cell_text(weight = "bold"),
      locations = cells_stub()
    ) |>
    sub_values(
      values = 1,
      replacement = ""
    ) |>
    opt_stylize(style = 1) |>
    opt_horizontal_padding(scale = 2) |>
    cols_width(everything() ~ px(70))

  return(tbl)
}

The function handles several details that users would otherwise need to address manually. It selects only numeric columns, computes correlations with appropriate handling of missing values, applies a diverging color palette centered on zero (so positive correlations appear blue, negative correlations appear red, and values near zero remain white), and replaces the diagonal values of 1 with empty strings since the correlation of a variable with itself is trivially perfect and not informative.

Testing with the gtcars dataset reveals the function’s output:

gtcars |>
  select(mpg_c, mpg_h, hp, hp_rpm, trq, trq_rpm) |>
  create_correlation_table(title = "Vehicle Performance Correlations")

	mpg_c	mpg_h	hp	hp_rpm	trq	trq_rpm
Vehicle Performance Correlations
Pearson correlation coefficients
mpg_c		0.84	−0.66	−0.42	−0.45	−0.47
mpg_h	0.84		−0.79	−0.60	−0.52	−0.63
hp	−0.66	−0.79		0.47	0.85	0.47
hp_rpm	−0.42	−0.60	0.47		0.03	0.79
trq	−0.45	−0.52	0.85	0.03		0.09
trq_rpm	−0.47	−0.63	0.47	0.79	0.09

The color encoding immediately reveals patterns. Strong positive correlations appear in deep blue, strong negative correlations in deep red. A reader scanning this table can instantly identify which variables move together and which move in opposition, without parsing individual numbers. The blank diagonal removes visual clutter, and the consistent formatting presents a polished appearance suitable for publication.

14.1.4 Building a descriptive statistics function

Descriptive statistics tables appear in virtually every research paper and many business reports. Yet producing them typically requires either tedious manual work or wrestling with packages that provide more than you need. A focused function can streamline this common task:

create_descriptive_stats <- function(
  data,
  variables = NULL,
  statistics = c("n", "mean", "sd", "min", "max"),
  by = NULL,
  decimals = 2,
  title = NULL
) {
  
  # Select variables to summarize
  if (is.null(variables)) {
    numeric_vars <- names(data)[sapply(data, is.numeric)]
  } else {
    numeric_vars <- variables
  }
  
  # Define statistic functions
  stat_fns <- list(
    n = function(x) sum(!is.na(x)),
    mean = function(x) mean(x, na.rm = TRUE),
    sd = function(x) sd(x, na.rm = TRUE),
    min = function(x) min(x, na.rm = TRUE),
    max = function(x) max(x, na.rm = TRUE),
    median = function(x) median(x, na.rm = TRUE),
    q25 = function(x) quantile(x, 0.25, na.rm = TRUE),
    q75 = function(x) quantile(x, 0.75, na.rm = TRUE)
  )
  
  # Calculate statistics for each variable
  if (is.null(by)) {
    # Overall statistics
    results <- lapply(numeric_vars, function(var) {
        vals <- data[[var]]
        stats <- sapply(statistics, function(s) stat_fns[[s]](vals))
        c(variable = var, stats)
    })
    
  summary_df <- as.data.frame(do.call(rbind, results))
  
  # Convert numeric columns
  for (stat in statistics) {
      summary_df[[stat]] <- as.numeric(summary_df[[stat]])
  }

  } else {

  # Statistics by group
  groups <- unique(data[[by]])
  results <- list()
    
  for (var in numeric_vars) {
    for (grp in groups) {
      vals <- data[[var]][data[[by]] == grp]
      stats <- sapply(statistics, function(s) stat_fns[[s]](vals))
      results[[length(results) + 1]] <- c(
        variable = var, 
        group = as.character(grp), 
        stats
      )
    }
  }

  summary_df <- as.data.frame(do.call(rbind, results))
  
    for (stat in statistics) {
      summary_df[[stat]] <- as.numeric(summary_df[[stat]])
    }
  }
  
  # Statistic labels
  stat_labels <- 
    c(
      n = "N",
      mean = "Mean",
      sd = "SD",
      min = "Min",
      max = "Max",
      median = "Median",
      q25 = "Q1",
      q75 = "Q3"
    )
  
  # Build the gt table
  if (is.null(by)) {
    tbl <-
      summary_df |>
      gt(rowname_col = "variable") |>
      tab_stubhead(label = "Variable") |>
      fmt_number(columns = where(is.numeric), decimals = decimals) |>
      fmt_integer(columns = any_of("n")) |>
      cols_label(.list = setNames(
        as.list(stat_labels[statistics]), 
        statistics
      ))

  } else {
    tbl <-
      summary_df |>
      gt(rowname_col = "variable", groupname_col = "group") |>
      tab_stubhead(label = "Variable") |>
      fmt_number(columns = where(is.numeric), decimals = decimals) |>
      fmt_integer(columns = any_of("n")) |>
      cols_label(.list = setNames(
        as.list(stat_labels[statistics]), 
        statistics
      ))
  }
  
  tbl <-
    tbl |>
    tab_header(title = if (is.null(title)) "Descriptive Statistics" else title) |>
    opt_stylize(style = 1) |>
    opt_horizontal_padding(scale = 2)
  
  return(tbl)
}

This function provides flexibility in what statistics to compute, allowing users to select from a menu of common options. The by parameter enables grouped analyses, producing side-by-side comparisons across categories. Let’s see both use cases:

gtcars |>
  create_descriptive_stats(
    variables = c("mpg_c", "mpg_h", "hp", "trq"),
    statistics = c("n", "mean", "sd", "min", "median", "max"),
    title = "Vehicle Performance Metrics"
  )

Variable	N	Mean	SD	Min	Median	Max
Vehicle Performance Metrics
mpg_c	46	15.33	3.43	11.00	15.00	28.00
mpg_h	46	22.20	3.87	16.00	22.00	30.00
hp	47	514.96	139.82	259.00	552.00	949.00
trq	47	441.02	101.46	243.00	436.00	664.00

And now with grouping by vehicle drivetrain:

gtcars |>
  create_descriptive_stats(
    variables = c("mpg_c", "hp"),
    statistics = c("n", "mean", "sd"),
    by = "drivetrain",
    title = "Performance by Drivetrain Type"
  )

Variable	N	Mean	SD
Performance by Drivetrain Type
rwd
mpg_c	34	15.15	2.79
hp	34	515.50	146.30
awd
mpg_c	12	15.83	4.93
hp	13	513.54	126.79

The grouped version organizes results by drivetrain type, making comparisons across vehicle configurations straightforward. Someone reading this table could immediately see how city fuel economy and horsepower differ between all-wheel drive, rear-wheel drive, and other configurations.

14.2 Providing wrapper functions to modify the table outputs

Not every extension needs to create tables from scratch. Sometimes the greater need is to modify existing tables in consistent ways. A wrapper function takes a gt table as input, applies transformations, and returns the modified table. This approach is great when you want to enforce organizational styling, add standard elements like logos or disclaimers, or provide convenient shortcuts for common formatting patterns.

14.2.1 A theming function for organizational branding

Organizations often have visual identity guidelines specifying colors, fonts, and other design elements. Creating a theming function ensures that all tables produced across the organization share a consistent appearance:

apply_corporate_theme <- function(
  gt_tbl,
  primary_color = "#1E3A5F",
  accent_color = "#E85D04",
  header_font = "Georgia",
  body_font = "Arial"
) {
  
  gt_tbl |>
    tab_options(
      # Header styling
      heading.background.color = primary_color,
      heading.title.font.size = px(18),
      heading.subtitle.font.size = px(14),
      
      # Column labels
      column_labels.background.color = primary_color,
      column_labels.font.weight = "bold",
      
      # Table body
      table.font.size = px(13),
      
      # Row striping
      row.striping.background_color = "#F5F5F5",
      row.striping.include_stub = TRUE,
      row.striping.include_table_body = TRUE,
      
      # Borders
      table_body.hlines.color = "#E0E0E0",
      table_body.vlines.color = "transparent",
      
      # Footer
      footnotes.font.size = px(11),
      source_notes.font.size = px(11)
    ) |>
    tab_style(
      style = cell_text(
        color = "white",
        font = header_font
      ),
      locations = list(
        cells_title(),
        cells_column_labels()
      )
    ) |>
    tab_style(
      style = cell_text(font = body_font),
      locations = cells_body()
    ) |>
    tab_style(
      style = cell_borders(
        sides = "bottom",
        color = accent_color,
        weight = px(3)
      ),
      locations = cells_column_labels()
    )
}

This theme function transforms any gt table to match corporate standards. The deep blue primary color establishes professionalism, the orange accent provides visual interest, and the specified fonts ensure consistency. Let’s apply it to a simple table:

gtcars |>
  select(mfr, model, year, hp, mpg_c) |>
  slice_head(n = 8) |>
  gt() |>
  tab_header(
    title = "Vehicle Performance Summary",
    subtitle = "Selected models from our database"
  ) |>
  fmt_integer(columns = c(year, hp)) |>
  fmt_number(columns = mpg_c, decimals = 1) |>
  cols_label(
    mfr = "Manufacturer",
    model = "Model",
    year = "Year",
    hp = "Horsepower",
    mpg_c = "City MPG"
  ) |>
  apply_corporate_theme()

Manufacturer	Model	Year	Horsepower	City MPG
Vehicle Performance Summary
Selected models from our database
Ford	GT	2,017	647	11.0
Ferrari	458 Speciale	2,015	597	13.0
Ferrari	458 Spider	2,015	562	13.0
Ferrari	458 Italia	2,014	562	13.0
Ferrari	488 GTB	2,016	661	15.0
Ferrari	California	2,015	553	16.0
Ferrari	GTC4Lusso	2,017	680	12.0
Ferrari	FF	2,015	652	11.0

Any table passed through apply_corporate_theme() acquires the organizational look. The function demonstrates how wrapper functions can encapsulate substantial complexity while providing a simple interface. Users need not understand the dozens of tab_options() parameters, they simply apply the theme.

14.2.2 Adding standard elements

Some contexts require standard elements on all tables: disclaimers, data sources, or organizational logos. A wrapper function can add these consistently:

add_report_footer <- function(
  gt_tbl,
  data_source = NULL,
  disclaimer = NULL,
  include_date = TRUE
) {
  
  # Add data source if provided
  if (!is.null(data_source)) {
    gt_tbl <-
      gt_tbl |>
      tab_source_note(
        source_note = paste("Data Source:", data_source)
      )
  }
  
  # Add disclaimer if provided
  if (!is.null(disclaimer)) {
    gt_tbl <-
      gt_tbl |>
      tab_source_note(
        source_note = md(paste0("*", disclaimer, "*"))
      )
  }
  
  # Add generation date
  if (include_date) {
    gt_tbl <-
      gt_tbl |>
      tab_source_note(
        source_note = paste("Report generated:", format(Sys.Date(), "%B %d, %Y"))
      )
  }
  
  return(gt_tbl)
}

This function adds a customizable footer section to any table. Used consistently, it ensures that all tables in a report carry appropriate attribution and disclaimers:

towny |>
  select(name, land_area_km2, population_2021, density_2021) |>
  slice_max(population_2021, n = 5) |>
  gt() |>
  tab_header(title = "Ontario's Largest Municipalities") |>
  fmt_integer(columns = c(population_2021, density_2021)) |>
  fmt_number(columns = land_area_km2, decimals = 1) |>
  cols_label(
    name = "Municipality",
    land_area_km2 = "Area (km²)",
    population_2021 = "Population",
    density_2021 = "Density"
  ) |>
  add_report_footer(
    data_source = "Statistics Canada, 2021 Census",
    disclaimer = "Figures subject to revision"
  )

Municipality	Area (km²)	Population	Density
Ontario's Largest Municipalities
Toronto	631.1	2,794,356	4,428
Ottawa	2,788.2	1,017,449	365
Mississauga	292.7	717,961	2,453
Brampton	265.9	656,480	2,469
Hamilton	1,118.3	569,353	509
Data Source: Statistics Canada, 2021 Census
Figures subject to revision
Report generated: February 12, 2026

14.2.3 A significance highlighting function

In statistical reporting, highlighting significant results is common practice. Rather than manually applying conditional formatting each time, a wrapper function can standardize this process:

highlight_significant <- function(
  gt_tbl,
  columns,
  threshold = 0.05,
  highlight_color = "#E8F5E9",
  bold = TRUE
) {
  
  # Apply background color to significant cells
  gt_tbl <-
    gt_tbl |>
    tab_style(
      style = cell_fill(color = highlight_color),
      locations = cells_body(
        columns = {{ columns }},
        rows = .data[[rlang::as_name(rlang::enquo(columns))]] < threshold
      )
    )
  
  # Optionally bold the significant values
  if (bold) {
    gt_tbl <-
      gt_tbl |>
      tab_style(
        style = cell_text(weight = "bold"),
        locations = cells_body(
          columns = {{ columns }},
          rows = .data[[rlang::as_name(rlang::enquo(columns))]] < threshold
        )
      )
  }
  
  return(gt_tbl)
}

This function takes a p-value column and highlights cells below the significance threshold. The visual emphasis draws attention to statistically significant findings without requiring readers to scan through numbers.

14.2.4 Building flexible style appliers

Sometimes you want to provide several pre-built styles that users can select. A style applier function with multiple options gives users flexibility while maintaining consistency:

apply_table_style <- function(
  gt_tbl, 
  style = c("minimal", "striped", "bordered", "scientific")
) {
  
  style <- match.arg(style)

  if (style == "minimal") {
    gt_tbl <-
      gt_tbl |>
      tab_options(
        table_body.hlines.color = "transparent",
        table_body.vlines.color = "transparent",
        column_labels.border.bottom.color = "black",
        column_labels.border.bottom.width = px(2),
        table_body.border.bottom.color = "black",
        table_body.border.bottom.width = px(2)
      )
      
  } else if (style == "striped") {
    gt_tbl <- 
      gt_tbl |>
      opt_row_striping() |>
      tab_options(
        row.striping.background_color = "#F8F9FA",
        table_body.hlines.color = "transparent"
      )
      
  } else if (style == "bordered") {
    gt_tbl <-
      gt_tbl |>
      tab_options(
        table_body.hlines.color = "#DEE2E6",
        table_body.vlines.color = "#DEE2E6",
        column_labels.border.bottom.color = "#343A40",
        column_labels.border.bottom.width = px(2)
      ) |>
      tab_style(
        style = cell_borders(
          sides = c("left", "right"),
          color = "#DEE2E6"
        ),
        locations = cells_body()
      )
      
  } else if (style == "scientific") {
    gt_tbl <-
      gt_tbl |>
      tab_options(
        table.font.size = px(11),
        heading.title.font.size = px(13),
        heading.subtitle.font.size = px(11),
        table_body.hlines.color = "transparent",
        column_labels.border.bottom.color = "black",
        column_labels.border.top.color = "black",
        table_body.border.bottom.color = "black"
      ) |>
      tab_style(
        style = cell_text(size = px(10)),
        locations = cells_source_notes()
      )
  }
  
  return(gt_tbl)
}

Users can select from predefined styles while the function handles all the underlying options:

base_table <-
  exibble |>
  select(char, num, currency) |>
  slice(1:5) |>
  gt() |>
  tab_header(title = "Style Comparison", subtitle = "Scientific style") |>
  fmt_number(columns = num, decimals = 2) |>
  fmt_currency(columns = currency)

base_table |>
  apply_table_style(style = "scientific")

char	num	currency
Style Comparison
Scientific style
apricot	0.11	$49.95
banana	2.22	$17.95
coconut	33.33	$1.39
durian	444.40	$65,100.00
NA	5,550.00	$1,325.81

14.3 Implementation ideas

The examples thus far demonstrate foundational patterns. This section explores more ambitious possibilities: extensions that push into specialized domains or provide capabilities not easily achieved with basic gt usage.

14.3.1 Comparison table generator

Many reports require side-by-side comparisons with calculated differences. A specialized function can automate this pattern:

create_comparison_table <- function(
  data,
  group_col,
  value_cols,
  group_labels = NULL,
  show_difference = TRUE,
  show_pct_change = TRUE,
  decimals = 1
) {
  
  groups <- unique(data[[group_col]])
  
  if (length(groups) != 2) {
    stop("Comparison requires exactly two groups")
  }
  
  # Split data by group
  group1_data <- data[data[[group_col]] == groups[1], ]
  group2_data <- data[data[[group_col]] == groups[2], ]
  
  # Create comparison data frame
  comparison_df <- dplyr::tibble(metric = value_cols)

  # Get values for each group (assuming single row per group or aggregating)
  comparison_df[[as.character(groups[1])]] <- sapply(value_cols, function(v) {
    mean(group1_data[[v]], na.rm = TRUE)
  })

  comparison_df[[as.character(groups[2])]] <- sapply(value_cols, function(v) {
    mean(group2_data[[v]], na.rm = TRUE)
  })
  
  # Calculate differences
  if (show_difference) {
    comparison_df$difference <- 
      comparison_df[[as.character(groups[2])]] -
      comparison_df[[as.character(groups[1])]]
  }

  if (show_pct_change) {
    comparison_df$pct_change <-
    (comparison_df[[as.character(groups[2])]] - 
    comparison_df[[as.character(groups[1])]]) /
    comparison_df[[as.character(groups[1])]] * 100
  }
  
  # Build table
  tbl <-
    comparison_df |>
    gt(rowname_col = "metric") |>
    fmt_number(
      columns = c(as.character(groups[1]), as.character(groups[2])), 
      decimals = decimals
    )
  
  if (show_difference) {
    tbl <-
      tbl |>
      fmt_number(
        columns = difference,
        decimals = decimals,
        force_sign = TRUE
      ) |>
      tab_style(
        style = cell_text(color = "green"),
        locations = cells_body(columns = difference, rows = difference > 0)
      ) |>
      tab_style(
        style = cell_text(color = "red"),
        locations = cells_body(columns = difference, rows = difference < 0)
      )
  }
  
  if (show_pct_change) {
    tbl <-
      tbl |>
      fmt_number(
        columns = pct_change,
        decimals = 1,
        force_sign = TRUE, 
        pattern = "{x}%"
      ) |>
      tab_style(
        style = cell_text(color = "green"),
        locations = cells_body(columns = pct_change, rows = pct_change > 0)
      ) |>
      tab_style(
        style = cell_text(color = "red"),
        locations = cells_body(columns = pct_change, rows = pct_change < 0)
      )
  }
  
  # Apply labels if provided
  if (!is.null(group_labels) && length(group_labels) == 2) {
    tbl <-
      tbl |>
      cols_label(
        !!as.character(groups[1]) := group_labels[1],
        !!as.character(groups[2]) := group_labels[2]
      )
  }
  
  if (show_difference) {
    tbl <- tbl |> cols_label(difference = "Diff")
  }
  if (show_pct_change) {
    tbl <- tbl |> cols_label(pct_change = "% Change")
  }
  
  tbl <-
    tbl |>
    tab_header(title = "Comparison Analysis") |>
    tab_stubhead(label = "Metric") |>
    opt_stylize(style = 1)
  
  return(tbl)
}

The function handles the tedious work of pivoting data, calculating differences, and applying conditional formatting. The color coding for positive and negative changes provides immediate visual feedback:

# Create sample comparison data
comparison_data <-
  dplyr::tibble(
    period = c("Q1", "Q1", "Q2", "Q2"),
    revenue = c(125000, 142000, 125000, 142000),
    expenses = c(98000, 105000, 98000, 105000),
    customers = c(1250, 1340, 1250, 1340)
  ) |>
  dplyr::filter(row_number() <= 2 | row_number() > 2) |>
  dplyr::distinct()

# Simpler approach: create pre-aggregated data
quarterly_metrics <-
  dplyr::tibble(
    quarter = c("Q1 2024", "Q2 2024"),
    revenue = c(125000, 142000),
    expenses = c(98000, 105000),
    customers = c(1250, 1420)
  )

quarterly_metrics |>
  create_comparison_table(
    group_col = "quarter",
    value_cols = c("revenue", "expenses", "customers"),
    group_labels = c("Q1 2024", "Q2 2024")
  )

Metric	Q1 2024	Q2 2024	Diff	% Change
Comparison Analysis
revenue	125,000.0	142,000.0	+17,000.0	+13.6%
expenses	98,000.0	105,000.0	+7,000.0	+7.1%
customers	1,250.0	1,420.0	+170.0	+13.6%

14.3.2 A grading or scoring table function

Educational and assessment contexts often require tables that map numeric scores to letter grades or performance categories. A specialized function can standardize this presentation:

create_grade_table <- function(
  data,
  name_col,
  score_col,
  max_score = 100,
  grade_breaks = c(90, 80, 70, 60),
  grade_labels = c("A", "B", "C", "D", "F"),
  show_percentage = TRUE,
  title = "Grade Report"
) {
  
  # Calculate percentages and grades
  result_df <-
    data |>
    mutate(
      percentage = .data[[score_col]] / max_score * 100,
      grade = cut(
        percentage,
        breaks = c(Inf, grade_breaks, -Inf),
        labels = grade_labels,
        right = FALSE
      )
    ) |>
    select(all_of(c(name_col, score_col)), percentage, grade) |>
    arrange(desc(percentage))
  
  # Grade colors
  grade_colors <- c(
    "A" = "#4CAF50",
    "B" = "#8BC34A", 
    "C" = "#FFC107",
    "D" = "#FF9800",
    "F" = "#F44336"
  )
  
  # Build table
  tbl <-
    result_df |>
    gt(rowname_col = name_col) |>
    tab_header(
      title = title,
      subtitle = paste("Maximum possible score:", max_score)
    ) |>
    fmt_integer(columns = all_of(score_col)) |>
    fmt_number(columns = percentage, decimals = 1, pattern = "{x}%") |>
    tab_stubhead(label = "Student") |>
    cols_label(
      !!score_col := "Score",
      percentage = "Percentage",
      grade = "Grade"
    )
  
  # Apply grade colors
  for (g in names(grade_colors)) {
    tbl <-
      tbl |>
      tab_style(
        style = list(
            cell_fill(color = grade_colors[[g]]),
            cell_text(weight = "bold", color = "white")
        ),
        locations = cells_body(columns = grade, rows = grade == g)
      )
  }
  
  # Add summary
  avg_score <- mean(result_df[[score_col]], na.rm = TRUE)
  avg_pct <- mean(result_df$percentage, na.rm = TRUE)
  
  tbl <-
    tbl |>
    tab_source_note(
      source_note = paste0(
        "Class Average: ", round(avg_score, 1), 
        " (", round(avg_pct, 1), "%)"
      )
    ) |>
    opt_stylize(style = 1)
  
  if (!show_percentage) {
      tbl <- tbl |> cols_hide(columns = percentage)
  }
  
  return(tbl)
}

The function calculates grades based on customizable breakpoints, applies color coding to make grade levels immediately visible, and provides class summary statistics:

# Sample student scores
student_scores <-
  dplyr::tibble(
    student = c(
      "Alice", "Billy", "Courtney", "Dirk",
      "Eva", "Frank", "Grace", "Henry"
    ),
    exam_score = c(95, 87, 78, 92, 65, 73, 88, 56)
  )

student_scores |>
  create_grade_table(
    name_col = "student",
    score_col = "exam_score",
    title = "Final Examination Results"
  )

Student	Score	Percentage	Grade
Final Examination Results
Maximum possible score: 100
Alice	95	95.0%	F
Dirk	92	92.0%	F
Grace	88	88.0%	D
Billy	87	87.0%	D
Courtney	78	78.0%	C
Frank	73	73.0%	C
Eva	65	65.0%	B
Henry	56	56.0%	A
Class Average: 79.2 (79.2%)

14.3.3 Data quality report function

Data quality assessment is crucial before any analysis. A dedicated function can automate the production of quality reports:

create_data_quality_report <- function(data, title = "Data Quality Report") {
  
  # Calculate quality metrics for each column
  quality_df <- 
    dplyr::tibble(
      column = names(data),
      data_type = sapply(data, function(x) class(x)[1]),
      total_rows = nrow(data),
      non_missing = sapply(data, function(x) sum(!is.na(x))),
      missing = sapply(data, function(x) sum(is.na(x))),
      missing_pct = sapply(data, function(x) mean(is.na(x)) * 100),
      unique_values = sapply(data, function(x) length(unique(x[!is.na(x)]))),
      completeness = sapply(data, function(x) (1 - mean(is.na(x))) * 100)
    )
  
  # Calculate quality score (simple weighted average)
  quality_df <-
    quality_df |>
    dplyr::mutate(
      quality_score = case_when(
        completeness >= 99 ~ "Excellent",
        completeness >= 95 ~ "Good",
        completeness >= 90 ~ "Fair",
        completeness >= 80 ~ "Poor",
        TRUE ~ "Critical"
    )
  )
  
  # Score colors
  score_colors <- c(
    "Excellent" = "#4CAF50",
    "Good" = "#8BC34A",
    "Fair" = "#FFC107",
    "Poor" = "#FF9800",
    "Critical" = "#F44336"
  )
  
  # Build table
  tbl <-
    quality_df |>
    gt(rowname_col = "column") |>
    tab_header(
      title = title,
      subtitle = paste(nrow(data), "rows analyzed")
    ) |>
    tab_stubhead(label = "Column") |>
    cols_hide(columns = c(total_rows, non_missing)) |>
    fmt_integer(columns = c(missing, unique_values)) |>
    fmt_number(columns = c(missing_pct, completeness), decimals = 1) |>
    cols_label(
      data_type = "Type",
      missing = "Missing",
      missing_pct = "Missing %",
      unique_values = "Unique",
      completeness = "Complete %",
      quality_score = "Quality"
    ) |>
    data_color(
      columns = completeness,
      palette = c("#F44336", "#FF9800", "#FFC107", "#8BC34A", "#4CAF50"),
      domain = c(0, 100)
    )
  
  # Apply quality score colors
  for (score in names(score_colors)) {
    tbl <-
      tbl |>
      tab_style(
        style = list(
          cell_fill(color = score_colors[[score]]),
          cell_text(weight = "bold")
        ),
        locations = cells_body(
          columns = quality_score,
          rows = quality_score == score
        )
      )
  }
  
  # Overall summary
  overall_completeness <- mean(quality_df$completeness)

  tbl <-
    tbl |>
    tab_source_note(
      source_note = paste0(
        "Overall Data Completeness: ", round(overall_completeness, 1), "%"
      )
    ) |>
    opt_stylize(style = 1) |>
    opt_horizontal_padding(scale = 2)

  return(tbl)
}

The report provides a comprehensive view of data quality, with visual indicators making problematic columns immediately apparent:

# Create sample data with varying quality
quality_test <-
  dplyr::tibble(
    id = 1:100,
    name = sample(
      c("Alice", "Bob", "Carol", NA), 100,
      replace = TRUE, 
      prob = c(0.3, 0.3, 0.3, 0.1)
    ),
    age = sample(c(25:65, NA), 100, replace = TRUE),
    salary = c(rep(NA, 25), sample(50000:150000, 75, replace = TRUE)),
    department = sample(c("Sales", "Engineering", "Marketing"), 100, replace = TRUE)
  )

quality_test |>
  create_data_quality_report(title = "Employee Data Quality Assessment")

Column	Type	Missing	Missing %	Unique	Complete %	Quality
Employee Data Quality Assessment
100 rows analyzed
id	integer	0	0.0	100	100.0	Excellent
name	character	8	8.0	3	92.0	Fair
age	integer	3	3.0	38	97.0	Good
salary	integer	25	25.0	75	75.0	Critical
department	character	0	0.0	3	100.0	Excellent
Overall Data Completeness: 92.8%

14.4 Packaging your extensions

The examples in this chapter are presented as standalone functions, but their true power emerges when packaged for reuse. An R package provides structure for documentation, testing, and distribution. It ensures that your extensions are available wherever you work and can be shared with colleagues.

Creating a package from extension functions follows standard R package development practices. The mechanics of package creation (directory structure, documentation with roxygen2, testing, dependency management) are thoroughly covered in R Packages by Hadley Wickham and Jennifer Bryan. That book provides comprehensive guidance on everything from initial setup to publication on CRAN.

For gt extensions specifically, a few considerations warrant attention. Document your functions with clear parameter descriptions and meaningful examples. Since your functions produce visual outputs, consider including example tables in vignettes where readers can see exactly what the functions produce. Think about scope: a package focused on a specific domain (financial reporting, academic publishing, healthcare analytics) will be more useful than a grab bag of unrelated utilities. Even a small collection of utilities that serve your own needs justifies the packaging effort; the discipline of creating a package improves the code itself.

14.5 Beyond basic extensions

The patterns explored in this chapter represent starting points rather than boundaries. More sophisticated extensions might integrate with external data sources, pulling data from databases or APIs and presenting it in formatted tables. They might generate multiple related tables from a single function call, producing suites of outputs for comprehensive reporting. They might provide interactive features for HTML outputs, adding user controls for filtering or sorting.

Some extensions might focus on specific output formats. A package designed for PDF reports might include functions optimized for print layout. A package for dashboards might emphasize compact, information-dense designs. A package for presentations might provide larger text sizes and simplified structures appropriate for projection.

Other extensions might introduce entirely new table components. While gt provides a rich vocabulary of table elements, specific domains may have conventions not directly supported. A package could define new structural elements and rendering logic for those conventions.

The key is identifying needs that arise repeatedly in your work and addressing them systematically. Each time you find yourself copying and modifying code from a previous project, that’s a signal that a function might be warranted. Each time you explain to a colleague how to format a certain type of table, that’s a signal that your explanation could be encoded in software.

Creating extensions for gt is an exercise in understanding both the package and your own requirements. It demands clarity about what you want to achieve and how gt can help achieve it. The resulting functions, when well designed, become multipliers of your effectiveness. They transform tasks that once required careful attention into operations that happen correctly by default.

Many successful gt extension packages began exactly this way. Someone recognized a gap, built functions to address it, and shared the result. Your extensions might start as personal utilities and grow into resources that benefit a wider community. Even if they remain private to your organization, they contribute to better, more consistent, more maintainable reporting. That contribution is the purpose of extending gt: not extension for its own sake but extension in service of clearer communication through better tables.

14.6 Summary

This chapter has explored how to extend gt by creating your own functions and packages. Whether addressing domain-specific needs or encoding organizational standards, extensions multiply the value of your table-making expertise.

The key approaches we’ve covered:

display table functions take data and produce complete, formatted tables. They encapsulate multi-step workflows (data transformation, table creation, formatting, styling) into single function calls with sensible defaults and useful customization options.
modifier functions accept existing gt tables and enhance them. They add consistent styling, domain-specific elements, or standard configurations without requiring users to remember every option.
data quality and diagnostic tools use gt’s presentation capabilities to communicate about data itself: completeness reports, validation summaries, and structural overviews.
packaging considerations: include clear documentation, meaningful examples, focused scope, and the discipline that formal packaging brings to code quality.

The patterns demonstrated here (dataset overviews, correlation matrices, data quality reports, modifier functions) represent starting points. Your extensions might address entirely different needs: financial reporting conventions, scientific publication requirements, corporate branding standards, or analytical workflows unique to your domain.

The underlying principle remains constant: identify repetitive table-making tasks, understand what varies and what stays constant, then encode the constants in functions while exposing the variations as parameters. Each extension you create reduces future effort while improving consistency.

This book has journeyed from gt’s foundational concepts through formatting, styling, and advanced features to the creation of extensions. The destination isn’t mastery of a package but rather the ability to communicate data effectively through well-crafted tables. gt provides the tools whereas your understanding of your data and your audience provides the direction. Together, they enable tables that inform, clarify, and persuade: tables worthy of the information they present.