7  Summary rows and columns

Tables often need more than just raw data. They frequently require summary statistics that help readers understand the data at a glance: totals, averages, minimums, maximums, and other aggregations. In gt, we have powerful tools for adding these summaries both vertically (as summary rows) and horizontally (as summary columns).

The summary functions in gt address a common frustration in table-making: the disconnect between calculating summary statistics and presenting them elegantly. In many workflows, you might calculate means and totals in your data manipulation code, then manually position them in your table, style them differently from the data rows, and hope nothing breaks when the data changes. The gt approach is different. Summary rows and columns are computed from the table data itself, positioned automatically, and can be styled and formatted independently of the body data. When your underlying data changes, the summaries update automatically.

This chapter covers three primary functions: summary_rows() for adding summaries within row groups, grand_summary_rows() for adding summaries across the entire table, and summary_columns() for adding computed columns that aggregate across rows. We’ll also cover extract_summary(), which lets you pull out summary data for use elsewhere. Together, these functions provide a complete toolkit for building tables that don’t just present data but help readers understand it.

7.1 The role of summary rows in tables

Before diving into the mechanics of creating summary rows, it’s worth considering when and why you might use them. Summary rows serve several distinct purposes.

First, they provide context. A column of sales figures means more when you can see the total at the bottom. A list of response times becomes interpretable when you know the average. Summary rows transform raw data into information by providing reference points.

Second, they enable comparison. When data is organized into groups, summary rows for each group allow readers to compare not just individual values but aggregate performance. How does this quarter compare to last quarter? How does the East region compare to the West? Summary rows answer these questions at a glance.

Third, they support verification. In financial and accounting contexts, totals serve as check figures. Readers (and auditors) expect to see them, and their presence signals that the table has been carefully prepared.

Finally, they reduce cognitive load. Rather than requiring readers to mentally calculate averages or identify maximums, summary rows do the work for them. This is especially valuable when tables will be scanned quickly rather than studied carefully.

The gt package supports two types of summary rows: group summaries (created with summary_rows()) that appear within or alongside each row group, and grand summaries (created with grand_summary_rows()) that summarize the entire table. Both types can include multiple summary statistics, can be positioned at the top or bottom of their respective sections, and can be formatted independently of the body data.

7.2 Adding group-level summaries with summary_rows()

The summary_rows() function adds summary rows to row groups in your table. This requires that your table actually has row groups, which you typically create by specifying a groupname_col in gt() or by using tab_row_group(). The summary rows appear within each group, providing statistics calculated from just that group’s data.

Let’s start with a practical example. We’ll use the sp500 dataset, which contains daily stock market data, and create a table with weekly row groups:

sp500 |>
  filter(date >= "2015-01-05" & date <= "2015-01-16") |>
  arrange(date) |>
  mutate(week = paste0("W", strftime(date, format = "%V"))) |>
  select(-adj_close, -volume) |>
  gt(
    rowname_col = "date",
    groupname_col = "week"
  ) |>
  summary_rows(
    fns = list("min", "max", "mean"),
    fmt = ~ fmt_number(., decimals = 2, use_seps = FALSE)
  )
open high low close
W02
2015-01-05 2054.44 2054.44 2017.34 2020.58
2015-01-06 2022.15 2030.25 1992.44 2002.61
2015-01-07 2005.55 2029.61 2005.55 2025.90
2015-01-08 2030.61 2064.08 2030.61 2062.14
2015-01-09 2063.45 2064.43 2038.33 2044.81
min 2005.55 2029.61 1992.44 2002.61
max 2063.45 2064.43 2038.33 2062.14
mean 2035.24 2048.56 2016.85 2031.21
W03
2015-01-12 2046.13 2049.30 2022.58 2028.26
2015-01-13 2031.58 2056.93 2008.25 2023.03
2015-01-14 2018.40 2018.40 1988.44 2011.27
2015-01-15 2013.75 2021.35 1991.47 1992.67
2015-01-16 1992.25 2020.46 1988.12 2019.42
min 1992.25 2018.40 1988.12 1992.67
max 2046.13 2056.93 2022.58 2028.26
mean 2020.42 2033.29 1999.77 2014.93

This table shows daily stock prices grouped by week, with minimum, maximum, and mean values calculated for each week. The summary rows appear at the bottom of each group by default, clearly labeled with the function names.

7.2.1 Specifying aggregation functions

The fns argument is where you tell summary_rows() what statistics to calculate. There are several ways to specify aggregation functions, ranging from simple to highly customized.

The simplest approach is to provide function names as quoted strings:

sp500 |>
  filter(date >= "2015-01-05" & date <= "2015-01-09") |>
  select(date, open, high, low, close) |>
  gt(rowname_col = "date") |>
  tab_row_group(label = "First Week", rows = everything()) |>
  summary_rows(
    fns = list("min", "max", "sum"),
    fmt = ~ fmt_number(., decimals = 2)
  )
open high low close
First Week
2015-01-09 2063.45 2064.43 2038.33 2044.81
2015-01-08 2030.61 2064.08 2030.61 2062.14
2015-01-07 2005.55 2029.61 2005.55 2025.90
2015-01-06 2022.15 2030.25 1992.44 2002.61
2015-01-05 2054.44 2054.44 2017.34 2020.58
min 2,005.55 2,029.61 1,992.44 2,002.61
max 2,063.45 2,064.43 2,038.33 2,062.14
sum 10,176.20 10,242.81 10,084.27 10,156.04

For convenience, gt automatically adds na.rm = TRUE to common aggregation functions when you specify them as strings. This applies to "min", "max", "mean", "median", "sd", and "sum". This saves you from writing out the full expression every time.

When you need more control, you can use formula syntax. The right-hand side (RHS) of the formula contains the aggregation expression, where . represents the column data:

sp500 |>
  filter(date >= "2015-01-05" & date <= "2015-01-09") |>
  select(date, open, close) |>
  gt(rowname_col = "date") |>
  tab_row_group(label = "First Week", rows = everything()) |>
  summary_rows(
    fns = list(
      range = ~ max(.) - min(.),
      avg = ~ mean(., na.rm = TRUE)
    ),
    fmt = ~ fmt_number(., decimals = 2)
  )
open close
First Week
2015-01-09 2063.45 2044.81
2015-01-08 2030.61 2062.14
2015-01-07 2005.55 2025.90
2015-01-06 2022.15 2002.61
2015-01-05 2054.44 2020.58
range 57.90 59.53
avg 2,035.24 2,031.21

The named list elements become both the ID and the label for each summary row. In this example, we calculate a custom range statistic (the difference between maximum and minimum) alongside the average.

For maximum flexibility, you can provide a double-sided formula where the left-hand side (LHS) specifies the ID and label:

sp500 |>
  filter(date >= "2015-01-05" & date <= "2015-01-09") |>
  select(date, open, close) |>
  gt(rowname_col = "date") |>
  tab_row_group(label = "First Week", rows = everything()) |>
  summary_rows(
    fns = list(
      list(id = "daily_range", label = md("**Daily Range**")) ~ max(.) - min(.),
      list(id = "average", label = md("*Average*")) ~ mean(., na.rm = TRUE)
    ),
    fmt = ~ fmt_number(., decimals = 2)
  )
open close
First Week
2015-01-09 2063.45 2044.81
2015-01-08 2030.61 2062.14
2015-01-07 2005.55 2025.90
2015-01-06 2022.15 2002.61
2015-01-05 2054.44 2020.58
Daily Range 57.90 59.53
Average 2,035.24 2,031.21

This approach lets you use different values for the row’s ID (used for targeting in other functions like tab_style()) and its label (displayed in the table). The label can include Markdown formatting via md() or HTML via html().

7.2.2 Targeting specific columns

By default, summary_rows() calculates aggregations for all columns. This is rarely what you want, especially when your table includes non-numeric columns or columns where aggregation doesn’t make sense.

The columns argument lets you specify which columns should receive summary calculations:

exibble |>
  select(group, num, currency) |>
  gt(groupname_col = "group") |>
  summary_rows(
    columns = c(num, currency),
    fns = list("sum", "mean"),
    fmt = ~ fmt_number(., decimals = 1)
  )
num currency
grp_a
1.111e-01 49.950
2.222e+00 17.950
3.333e+01 1.390
4.444e+02 65100.000
sum 480.1 65,169.3
mean 120.0 16,292.3
grp_b
5.550e+03 1325.810
NA 13.255
7.770e+05 NA
8.880e+06 0.440
sum 9,662,550.0 1,339.5
mean 3,220,850.0 446.5

You can use any tidyselect expression in columns:

exibble |>
  select(group, num, currency, char) |>
  gt(groupname_col = "group") |>
  summary_rows(
    columns = where(is.numeric),
    fns = list("sum"),
    fmt = ~ fmt_number(., decimals = 1)
  )
num currency char
grp_a
1.111e-01 49.950 apricot
2.222e+00 17.950 banana
3.333e+01 1.390 coconut
4.444e+02 65100.000 durian
sum 480.1 65,169.3
grp_b
5.550e+03 1325.810 NA
NA 13.255 fig
7.770e+05 NA grapefruit
8.880e+06 0.440 honeydew
sum 9,662,550.0 1,339.5

Using where(is.numeric) ensures that only numeric columns receive summary calculations, even if the table structure changes later.

7.2.3 Targeting specific groups

Sometimes you only want summary rows for certain groups, not all of them. The groups argument controls which row groups receive summaries:

exibble |>
  select(group, num, currency) |>
  gt(groupname_col = "group") |>
  summary_rows(
    groups = "grp_a",
    columns = c(num, currency),
    fns = list("sum"),
    fmt = ~ fmt_number(., decimals = 1)
  )
num currency
grp_a
1.111e-01 49.950
2.222e+00 17.950
3.333e+01 1.390
4.444e+02 65100.000
sum 480.1 65,169.3
grp_b
5.550e+03 1325.810
NA 13.255
7.770e+05 NA
8.880e+06 0.440

This adds a sum row only to grp_a, leaving grp_b without summaries. The groups argument accepts group IDs as strings, vectors of strings, or tidyselect helpers like starts_with() or matches().

7.2.4 Positioning summary rows

By default, summary rows appear at the bottom of each group. The side argument lets you place them at the top instead:

countrypops |>
  filter(country_code_2 %in% c("BR", "RU", "IN", "CN")) |>
  filter(year %in% c(2000, 2010, 2020)) |>
  select(country_name, year, population) |>
  tidyr::pivot_wider(names_from = year, values_from = population) |>
  gt(rowname_col = "country_name") |>
  tab_row_group(label = "BRIC Nations", rows = everything()) |>
  summary_rows(
    fns = list(label = md("**TOTAL**"), id = "total", fn = "sum"),
    fmt = ~ fmt_integer(.),
    side = "top"
  )
2000 2010 2020
BRIC Nations
TOTAL 2,641,182,884 2,917,737,961 3,167,623,685
Brazil 174018282 193701929 208660842
China 1262645000 1337705000 1411100000
India 1057922733 1243481564 1402617695
Russia 146596869 142849468 145245148

Placing totals at the top is common in financial reporting, where readers want to see the bottom line first. The summary row appears before the detail rows, inviting readers to drill down into the components.

7.2.5 Formatting summary values

The fmt argument accepts formatting expressions that control how summary values appear. You can use any gt formatting function:

exibble |>
  select(group, num, currency) |>
  gt(groupname_col = "group") |>
  summary_rows(
    columns = c(num, currency),
    fns = list("sum", "mean"),
    fmt = list(
      ~ fmt_number(., decimals = 0, use_seps = TRUE),
      ~ fmt_currency(., currency = "USD")
    )
  )
num currency
grp_a
1.111e-01 49.950
2.222e+00 17.950
3.333e+01 1.390
4.444e+02 65100.000
sum $480.06 $65,169.29
mean $120.02 $16,292.32
grp_b
5.550e+03 1325.810
NA 13.255
7.770e+05 NA
8.880e+06 0.440
sum $9,662,550.00 $1,339.50
mean $3,220,850.00 $446.50

When you provide a list of formatting expressions, they’re applied in order to the summary rows. In this example, the sum row uses fmt_number() and the mean row uses fmt_currency().

You can also target specific columns within the formatting expression:

exibble |>
  select(group, num, currency) |>
  gt(groupname_col = "group") |>
  summary_rows(
    columns = c(num, currency),
    fns = list("sum"),
    fmt = ~ fmt_number(., columns = num, decimals = 1) |>
           fmt_currency(columns = currency, currency = "EUR")
  )
num currency
grp_a
1.111e-01 49.950
2.222e+00 17.950
3.333e+01 1.390
4.444e+02 65100.000
sum 480.1 €65,169.29
grp_b
5.550e+03 1325.810
NA 13.255
7.770e+05 NA
8.880e+06 0.440
sum 9,662,550.0 €1,339.50

For group-specific formatting, use a two-sided formula where the LHS identifies the group:

exibble |>
  select(group, num, currency) |>
  gt(groupname_col = "group") |>
  summary_rows(
    columns = c(num, currency),
    fns = list("sum"),
    fmt = list(
      "grp_a" ~ fmt_number(., decimals = 0),
      "grp_b" ~ fmt_number(., decimals = 2)
    )
  )
num currency
grp_a
1.111e-01 49.950
2.222e+00 17.950
3.333e+01 1.390
4.444e+02 65100.000
sum 480 65,169
grp_b
5.550e+03 1325.810
NA 13.255
7.770e+05 NA
8.880e+06 0.440
sum 9,662,550.00 1,339.50

This technique is useful when groups have different precision requirements. Perhaps one group contains whole-number counts while another contains measurements that warrant decimal places. The two-sided formula syntax keeps these formatting rules close to the summary definition, making the code easier to maintain.

7.2.6 Handling missing values

When aggregation produces NA values (for example, when all values in a column are NA), the missing_text argument controls what appears in the cell:

dplyr::tibble(
  group = c("A", "A", "B", "B"),
  value = c(1, 2, NA, NA)
) |>
  gt(groupname_col = "group") |>
  summary_rows(
    columns = value,
    fns = list("mean"),
    missing_text = "No data"
  )
value
A
1
2
mean 1.5
B
NA
NA
mean No data

The default missing_text is "---", which provides a visual placeholder that clearly indicates the absence of a calculable value (an em dash).

7.3 Adding table-level summaries with grand_summary_rows()

While summary_rows() operates on individual row groups, grand_summary_rows() calculates aggregations across all data in the table, regardless of grouping. Grand summary rows typically appear at the very bottom (or top) of the table, providing overall totals or averages.

sp500 |>
  filter(date >= "2015-01-05" & date <= "2015-01-16") |>
  arrange(date) |>
  mutate(week = paste0("W", strftime(date, format = "%V"))) |>
  select(-adj_close, -volume) |>
  gt(
    rowname_col = "date",
    groupname_col = "week"
  ) |>
  summary_rows(
    fns = list("min", "max"),
    fmt = ~ fmt_number(., decimals = 2, use_seps = FALSE)
  ) |>
  grand_summary_rows(
    columns = c(open, high, low, close),
    fns = list(
      list(id = "grand_min", label = "Overall Min") ~ min(.),
      list(id = "grand_max", label = "Overall Max") ~ max(.),
      list(id = "grand_avg", label = "Overall Avg") ~ mean(.)
    ),
    fmt = ~ fmt_number(., decimals = 2, use_seps = FALSE)
  )
open high low close
W02
2015-01-05 2054.44 2054.44 2017.34 2020.58
2015-01-06 2022.15 2030.25 1992.44 2002.61
2015-01-07 2005.55 2029.61 2005.55 2025.90
2015-01-08 2030.61 2064.08 2030.61 2062.14
2015-01-09 2063.45 2064.43 2038.33 2044.81
min 2005.55 2029.61 1992.44 2002.61
max 2063.45 2064.43 2038.33 2062.14
W03
2015-01-12 2046.13 2049.30 2022.58 2028.26
2015-01-13 2031.58 2056.93 2008.25 2023.03
2015-01-14 2018.40 2018.40 1988.44 2011.27
2015-01-15 2013.75 2021.35 1991.47 1992.67
2015-01-16 1992.25 2020.46 1988.12 2019.42
min 1992.25 2018.40 1988.12 1992.67
max 2046.13 2056.93 2022.58 2028.26
Overall Min 1992.25 2018.40 1988.12 1992.67
Overall Max 2063.45 2064.43 2038.33 2062.14
Overall Avg 2027.83 2040.92 2008.31 2023.07

This table now has both group-level summaries (min and max for each week) and grand summaries (overall min, max, and average for the entire period). The grand summary rows span all groups, summarizing the complete dataset.

7.3.1 Arguments and usage

The grand_summary_rows() function shares most of its arguments with summary_rows():

  • columns: Which columns to aggregate (defaults to all)
  • fns: Aggregation expressions (same syntax as summary_rows())
  • fmt: Formatting expressions
  • side: "bottom" (default) or "top"
  • missing_text: Text for NA values (default: "---")

The key difference is that grand_summary_rows() has no groups argument, since it always operates on all data.

countrypops |>
  filter(country_code_2 %in% c("BE", "NL", "LU")) |>
  filter(year %% 10 == 0) |>
  select(country_name, year, population) |>
  tidyr::pivot_wider(names_from = year, values_from = population) |>
  gt(rowname_col = "country_name") |>
  tab_header(title = "Populations of the Benelux Countries") |>
  tab_spanner(columns = everything(), label = "Year") |>
  fmt_integer() |>
  grand_summary_rows(
    fns = list(label = "TOTALS", id = "totals", fn = "sum"),
    fmt = ~ fmt_integer(.),
    side = "top"
  ) |>
  tab_style(
    locations = cells_grand_summary(),
    style = cell_fill(color = "lightblue")
  )
Populations of the Benelux Countries
Year
1960 1970 1980 1990 2000 2010 2020
TOTALS 20,954,090 23,033,246 24,373,192 25,300,739 26,613,063 28,017,933 29,610,523
Belgium 9,153,489 9,655,549 9,859,242 9,967,379 10,251,250 10,895,586 11,538,604
Luxembourg 313,970 339,171 364,150 381,850 436,300 506,953 630,419
Netherlands 11,486,631 13,038,526 14,149,800 14,951,510 15,925,513 16,615,394 17,441,500

In this example, the grand summary row appears at the top of the table (a common convention for totals) and is styled with a light blue background to distinguish it from the data rows.

7.3.2 Referencing other columns in aggregations

A powerful but often overlooked feature of summary rows is the ability to reference other columns within an aggregation formula. This is particularly useful when you need to compute a summary that depends on values from multiple columns, such as a weighted average or a ratio that must be calculated from totals rather than averaged directly.

Let’s consider a sales dataset where each row shows units sold and total profit for a product. If you want the grand summary to show the overall profit per unit, you cannot simply average the per-unit profits from each row (that would give incorrect results if the rows have different unit volumes). Instead, you need to sum all profits and divide by the sum of all units. The fns argument allows you to write aggregation expressions that reference any column in the table data, not just the column being summarized.

dplyr::tibble(
  product = c("Widget", "Gadget", "Sprocket", "Gizmo", "Doodad"),
  units = c(150, 280, 95, 420, 175),
  profit = c(4500, 11200, 3325, 12600, 6125)
) |>
  dplyr::mutate(profit_per_unit = profit / units) |>
  gt(rowname_col = "product") |>
  fmt_integer(columns = units) |>
  fmt_currency(columns = c(profit, profit_per_unit)) |>
  cols_label(
    units = "Units Sold",
    profit = "Total Profit",
    profit_per_unit = "Profit/Unit"
  ) |>
  grand_summary_rows(
    columns = units,
    fns = list(label = md("**TOTAL**"), id = "totals", fn = "sum"),
    fmt = ~ fmt_integer(.)
  ) |>
  grand_summary_rows(
    columns = c(profit, profit_per_unit),
    fns = list(label = md("**TOTAL**"), id = "totals", fn = "sum"),
    fmt = ~ fmt_currency(.)
  ) |>
  grand_summary_rows(
    columns = profit_per_unit,
    fns = list(label = md("**TOTAL**"), id = "totals") ~ sum(profit) / sum(units),
    fmt = ~ fmt_currency(.)
  ) |>
  tab_style(
    locations = cells_grand_summary(),
    style = cell_fill(color = "lightgray")
  )
Units Sold Total Profit Profit/Unit
Widget 150 $4,500.00 $30.00
Gadget 280 $11,200.00 $40.00
Sprocket 95 $3,325.00 $35.00
Gizmo 420 $12,600.00 $30.00
Doodad 175 $6,125.00 $35.00
TOTAL 1,120 $37,750.00 $33.71

This example demonstrates several techniques worth noting. First, we call grand_summary_rows() three times, but all calls use the same id = "totals". This produces a single summary row in the table, not three separate rows. By splitting the aggregation across multiple calls, we can apply different logic to different columns while maintaining a unified row label.

Second, each call uses its own formatting: fmt_integer() for the units column, and fmt_currency() for the profit and profit-per-unit columns. This flexibility is essential when a single summary row must display values with fundamentally different formats.

Third, the profit_per_unit summary uses a custom formula that references the profit and units columns directly. The expression sum(profit) / sum(units) calculates the correct weighted profit per unit across all products (totaling "$37,750" profit divided by 1,120 units gives "$33.71" per unit), rather than averaging the individual ratios (which would incorrectly give "$33.04").

This technique works in both summary_rows() and grand_summary_rows(). Within a group summary, the column references resolve to just that group’s data. Within a grand summary, they resolve to the entire table’s data. This makes it straightforward to compute derived statistics like weighted averages, ratios of totals, or any calculation that requires multiple columns.

7.3.3 Combining group and grand summaries

Many real-world tables require multiple levels of aggregation. A financial report might show subtotals for each department and a grand total for the entire organization. A sales analysis might present regional summaries alongside company-wide figures. A scientific study might report group means as well as overall statistics. In these situations, you need both summary_rows() and grand_summary_rows() working together.

The key to making multi-level summaries work is visual differentiation. When readers encounter two or three types of summary rows in a single table, they need clear cues to distinguish subtotals from grand totals from data rows. This typically means using different background colors, text weights, or indentation levels for each type. Without such differentiation, the hierarchy becomes confusing and the summaries lose their communicative value.

The following example demonstrates this pattern with population data for two groups of countries:

countrypops |>
  filter(
    country_code_2 %in% c("BR", "RU", "IN", "CN", "FR", "DE", "IT", "GB")
  ) |>
  filter(year %% 10 == 0) |>
  select(country_name, year, population) |>
  tidyr::pivot_wider(names_from = year, values_from = population) |>
  gt(rowname_col = "country_name") |>
  tab_row_group(
    label = md("*BRIC*"),
    rows = c("Brazil", "Russia", "India", "China"),
    id = "bric"
  ) |>
  tab_row_group(
    label = md("*Big Four*"),
    rows = c("France", "Germany", "Italy", "United Kingdom"),
    id = "big4"
  ) |>
  row_group_order(groups = c("bric", "big4")) |>
  tab_stub_indent(rows = everything()) |>
  tab_header(title = "Populations of the BRIC and Big Four Countries") |>
  tab_spanner(columns = everything(), label = "Year") |>
  fmt_number(n_sigfig = 3, suffixing = TRUE) |>
  summary_rows(
    fns = list(label = md("**Subtotal**"), id = "subtotal", fn = "sum"),
    fmt = ~ fmt_number(., n_sigfig = 3, suffixing = TRUE),
    side = "bottom"
  ) |>
  grand_summary_rows(
    fns = list(label = md("**GRAND TOTAL**"), id = "grand_total", fn = "sum"),
    fmt = ~ fmt_number(., n_sigfig = 3, suffixing = TRUE)
  ) |>
  tab_style(
    locations = cells_summary(),
    style = cell_fill(color = "lightgray")
  ) |>
  tab_style(
    locations = cells_grand_summary(),
    style = list(
      cell_fill(color = "steelblue"),
      cell_text(color = "white")
    )
  )
Populations of the BRIC and Big Four Countries
Year
1960 1970 1980 1990 2000 2010 2020
BRIC
Brazil 72.4M 95.4M 121M 149M 174M 194M 209M
China 667M 818M 981M 1.14B 1.26B 1.34B 1.41B
India 436M 546M 687M 865M 1.06B 1.24B 1.40B
Russia 120M 130M 139M 148M 147M 143M 145M
Subtotal 1.30B 1.59B 1.93B 2.30B 2.64B 2.92B 3.17B
Big Four
Germany 72.8M 78.2M 78.3M 79.4M 82.2M 81.8M 83.2M
France 47.4M 52.0M 55.3M 58.3M 60.9M 65.0M 67.6M
United Kingdom 52.4M 55.7M 56.3M 57.2M 58.9M 62.8M 66.7M
Italy 50.2M 53.8M 56.4M 56.7M 56.9M 59.8M 59.4M
Subtotal 223M 240M 246M 252M 259M 269M 277M
GRAND TOTAL 1.52B 1.83B 2.18B 2.55B 2.90B 3.19B 3.44B

This table shows populations for two groups of countries (BRIC and Big Four), with subtotals for each group and a grand total for all countries combined. The styling distinguishes the different types of summary rows, making the table’s structure immediately clear.

7.4 Adding summary columns with summary_columns()

While summary rows aggregate data vertically (down columns), summary columns aggregate data horizontally (across rows). The summary_columns() function computes row-wise aggregations and adds them as new columns to your table. This is useful when you want to show totals, averages, or other statistics that combine values from multiple columns within each row.

exibble |>
  select(row, num, currency) |>
  gt(rowname_col = "row") |>
  summary_columns(
    columns = c(num, currency),
    fns = ~ sum(., na.rm = TRUE),
    new_col_names = "total",
    new_col_labels = md("**Total**")
  )
num currency Total
row_1 1.111e-01 49.950 5.00611e+01
row_2 2.222e+00 17.950 2.01720e+01
row_3 3.333e+01 1.390 3.47200e+01
row_4 4.444e+02 65100.000 6.55444e+04
row_5 5.550e+03 1325.810 6.87581e+03
row_6 NA 13.255 1.32550e+01
row_7 7.770e+05 NA 7.77000e+05
row_8 8.880e+06 0.440 8.88000e+06

This adds a total column that sums the num and currency values for each row. The new column appears on the right side of the table by default and can be formatted, styled, and manipulated like any other column.

7.4.1 Specifying aggregation functions

The fns argument works similarly to the summary row functions, but the aggregation happens across columns rather than down rows. This horizontal aggregation is particularly useful for time-series data where columns represent periods (months, quarters, years) and you want to show totals or averages for each row. It’s also valuable when columns represent categories or components that naturally sum to a meaningful whole.

Unlike summary rows, which aggregate many observations into fewer summary values, summary columns preserve the row structure while adding derived values. Each row in your original data still appears in the output; you’re simply enriching it with computed statistics. This makes summary columns ideal for dashboards and reports where readers want to see both the details and the row-level summaries side by side.

dplyr::tibble(
  product = c("Widget", "Gadget", "Sprocket"),
  q1_sales = c(150, 200, 175),
  q2_sales = c(180, 190, 160),
  q3_sales = c(200, 210, 190),
  q4_sales = c(220, 205, 210)
) |>
  gt(rowname_col = "product") |>
  summary_columns(
    columns = ends_with("_sales"),
    fns = list(
      ~ sum(.),
      ~ mean(.)
    ),
    new_col_names = c("annual_total", "quarterly_avg"),
    new_col_labels = c("Annual Total", "Quarterly Avg")
  ) |>
  fmt_number(columns = quarterly_avg, decimals = 1)
q1_sales q2_sales q3_sales q4_sales Annual Total Quarterly Avg
Widget 150 180 200 220 750 187.5
Gadget 200 190 210 205 805 201.2
Sprocket 175 160 190 210 735 183.8

You can add multiple summary columns at once by providing a list of functions. Each function generates a separate column, and you can specify names and labels for each.

7.4.2 Custom aggregation expressions

Simple aggregations like sums and means cover many use cases, but sometimes you need calculations that involve multiple columns in more complex ways. Perhaps you want to compute a percentage change between two columns, calculate a ratio, or apply a formula that references specific column positions. The bracket notation in summary_columns() makes this possible.

When you write .[1], .[2], and so on in your formula, you’re referring to the columns in the order they appear in your columns argument. This positional reference system lets you build arbitrarily complex expressions. The trade-off is readability: bracket notation is concise but can be cryptic if you’re not careful. For complex calculations, consider adding a comment in your code explaining what each position represents.

countrypops |>
  arrange(country_name, year) |>
  group_by(country_name, country_code_2) |>
  summarize(
    pop_1960 = first(population),
    pop_2024 = last(population),
    years = n(),
    .groups = "drop"
  ) |>
  filter(!is.na(pop_1960), !is.na(pop_2024)) |>
  slice_max(pop_2024 - pop_1960, n = 5) |>
  select(country_name, pop_1960, pop_2024, years) |>
  gt(rowname_col = "country_name") |>
  summary_columns(
    columns = c(pop_1960, pop_2024, years),
    fns = ~ (.[2] - .[1]) / (.[3] - 1),
    new_col_names = "avg_annual_change",
    new_col_labels = md("**Avg. Annual Change**")
  ) |>
  fmt_integer() |>
  cols_label(
    pop_1960 = "1960 Pop.",
    pop_2024 = "2024 Pop.",
    years = "Years"
  )
1960 Pop. 2024 Pop. Years Avg. Annual Change
India 435,990,338 1,450,935,791 65 15,858,523
China 667,070,000 1,408,975,000 65 11,592,266
Pakistan 45,709,310 251,269,164 65 3,211,873
Indonesia 88,296,070 283,487,931 65 3,049,873
Nigeria 45,053,782 232,679,478 65 2,931,652

In this expression, .[1] refers to the first selected column (pop_1960), .[2] to the second (pop_2024), and .[3] to the third (years). This allows complex calculations that reference multiple columns.

7.4.3 Formatting summary columns

Summary columns often require different formatting than the source columns they’re derived from. A column of quarterly sales figures might use integer formatting, but the average of those quarters might warrant one decimal place. A column showing year-over-year change might need percentage formatting even when the source columns are raw numbers. The fmt argument lets you apply formatting to the new column immediately, without a separate formatting step.

This inline formatting is particularly convenient because it keeps the summary column’s definition self-contained. Everything about the column (what it calculates, what it’s called, and how it’s displayed) is specified in a single function call. This makes the code easier to read and maintain, especially when you’re creating multiple summary columns with different formatting requirements.

dplyr::tibble(
  item = c("Revenue", "Expenses", "Profit"),
  jan = c(10000, 8000, 2000),
  feb = c(12000, 8500, 3500),
  mar = c(11000, 8200, 2800)
) |>
  gt(rowname_col = "item") |>
  summary_columns(
    columns = c(jan, feb, mar),
    fns = ~ mean(.),
    new_col_names = "avg",
    new_col_labels = "Q1 Average",
    fmt = ~ fmt_currency(., currency = "USD", decimals = 0)
  ) |>
  fmt_currency(columns = c(jan, feb, mar), currency = "USD", decimals = 0)
jan feb mar Q1 Average
Revenue $10,000 $12,000 $11,000 $11,000
Expenses $8,000 $8,500 $8,200 $8,233
Profit $2,000 $3,500 $2,800 $2,767

The summary column displays the Q1 average for each row, formatted as currency to match the source columns. By handling the formatting within summary_columns(), the new column integrates seamlessly with the rest of the table. Readers see consistent dollar formatting across all columns, whether they contain original data or computed summaries.

7.5 Styling summary rows

Summary rows can be styled like any other table element using tab_style(). The location helper functions cells_summary() and cells_grand_summary() target summary rows specifically.

7.5.1 Styling all summary rows

Summary rows serve a different purpose than data rows, and that difference should be visible. Readers scanning a table need to immediately distinguish totals and averages from individual observations. Without visual differentiation, summary rows blend into the data, and readers may misinterpret aggregated values as additional data points.

The most common styling choices for summary rows include background fills (light gray is a classic choice), bold text, borders above or below the row, and italic styling for certain types of summaries like averages. The goal is distinction without distraction: the styling should set summary rows apart without overwhelming the table’s visual hierarchy.

To apply uniform styling to all summary rows across all groups, use cells_summary() without any filtering arguments:

exibble |>
  select(group, num, currency) |>
  gt(groupname_col = "group") |>
  summary_rows(
    columns = c(num, currency),
    fns = list("sum", "mean"),
    fmt = ~ fmt_number(., decimals = 1)
  ) |>
  tab_style(
    locations = cells_summary(),
    style = list(
      cell_fill(color = "lightgray"),
      cell_text(weight = "bold")
    )
  )
num currency
grp_a
1.111e-01 49.950
2.222e+00 17.950
3.333e+01 1.390
4.444e+02 65100.000
sum 480.1 65,169.3
mean 120.0 16,292.3
grp_b
5.550e+03 1325.810
NA 13.255
7.770e+05 NA
8.880e+06 0.440
sum 9,662,550.0 1,339.5
mean 3,220,850.0 446.5

The gray background and bold text make the summary rows immediately identifiable. This blanket styling approach works well when all summary rows serve the same purpose and deserve equal visual weight.

7.5.2 Targeting specific summary rows

You can target specific summary rows by their ID using the rows argument in cells_summary():

exibble |>
  select(group, num, currency) |>
  gt(groupname_col = "group") |>
  summary_rows(
    columns = c(num, currency),
    fns = list(
      list(id = "total", label = "Total") ~ sum(., na.rm = TRUE),
      list(id = "average", label = "Average") ~ mean(., na.rm = TRUE)
    ),
    fmt = ~ fmt_number(., decimals = 1)
  ) |>
  tab_style(
    locations = cells_summary(rows = "total"),
    style = cell_fill(color = "lightblue")
  ) |>
  tab_style(
    locations = cells_summary(rows = "average"),
    style = cell_text(style = "italic")
  )
num currency
grp_a
1.111e-01 49.950
2.222e+00 17.950
3.333e+01 1.390
4.444e+02 65100.000
Total 480.1 65,169.3
Average 120.0 16,292.3
grp_b
5.550e+03 1325.810
NA 13.255
7.770e+05 NA
8.880e+06 0.440
Total 9,662,550.0 1,339.5
Average 3,220,850.0 446.5

By assigning IDs to summary rows in the fns argument, you gain precise control over their styling. Here, totals receive a blue background to signal their importance as definitive figures, while averages use italic text to suggest they’re derived statistics rather than absolute values. This kind of semantic styling helps readers interpret the numbers correctly.

7.5.3 Targeting specific groups and columns

The cells_summary() function accepts groups and columns arguments for precise targeting:

exibble |>
  select(group, num, currency) |>
  gt(groupname_col = "group") |>
  summary_rows(
    columns = c(num, currency),
    fns = list("sum"),
    fmt = ~ fmt_number(., decimals = 1)
  ) |>
  tab_style(
    locations = cells_summary(
      groups = "grp_a",
      columns = currency
    ),
    style = cell_fill(color = "lightgreen")
  )
num currency
grp_a
1.111e-01 49.950
2.222e+00 17.950
3.333e+01 1.390
4.444e+02 65100.000
sum 480.1 65,169.3
grp_b
5.550e+03 1325.810
NA 13.255
7.770e+05 NA
8.880e+06 0.440
sum 9,662,550.0 1,339.5

This level of precision is useful when you want to highlight specific values (perhaps a particularly important group, a column that exceeded targets, or a combination that warrants the reader’s attention). The cells_summary() helper accepts any combination of groups, columns, and rows arguments, giving you fine-grained control over exactly which cells receive styling.

7.5.4 Styling grand summary rows

Grand summary rows use the cells_grand_summary() location helper:

exibble |>
  select(group, num, currency) |>
  gt(groupname_col = "group") |>
  grand_summary_rows(
    columns = c(num, currency),
    fns = list(
      list(id = "grand_total", label = md("**Grand Total**")) ~ sum(., na.rm = TRUE)
    ),
    fmt = ~ fmt_number(., decimals = 1)
  ) |>
  tab_style(
    locations = cells_grand_summary(),
    style = list(
      cell_fill(color = "navy"),
      cell_text(color = "white", weight = "bold")
    )
  )
num currency
grp_a
1.111e-01 49.950
2.222e+00 17.950
3.333e+01 1.390
4.444e+02 65100.000
grp_b
5.550e+03 1325.810
NA 13.255
7.770e+05 NA
8.880e+06 0.440
Grand Total 9,663,030.1 66,508.8

The dark background with white text creates strong visual emphasis appropriate for a grand total (the single most important number in many tables). Like cells_summary(), the cells_grand_summary() helper accepts columns and rows arguments for more targeted styling when needed.

7.6 Adding footnotes to summary rows

Footnotes can be attached to summary rows using tab_footnote() with the appropriate location helpers:

exibble |>
  select(group, num, currency) |>
  gt(groupname_col = "group") |>
  summary_rows(
    columns = c(num, currency),
    fns = list(
      list(id = "avg", label = "Average") ~ mean(., na.rm = TRUE)
    ),
    fmt = ~ fmt_number(., decimals = 2)
  ) |>
  tab_footnote(
    footnote = "Arithmetic mean excluding missing values",
    locations = cells_summary(
      groups = everything(),
      columns = num,
      rows = "avg"
    )
  )
num currency
grp_a
1.111e-01 49.950
2.222e+00 17.950
3.333e+01 1.390
4.444e+02 65100.000
Average 120.021 16,292.32
grp_b
5.550e+03 1325.810
NA 13.255
7.770e+05 NA
8.880e+06 0.440
Average 3,220,850.001 446.50
1 Arithmetic mean excluding missing values

The footnote appears on every “Average” row across all groups, clarifying the calculation method. This is particularly important for summary statistics where multiple calculation methods exist (arithmetic vs. geometric mean, sample vs. population standard deviation) or where the handling of missing values affects interpretation. Footnotes on summary rows serve the same documentary purpose as footnotes elsewhere in the table, providing essential context without cluttering the display.

7.7 Extracting summary data with extract_summary()

Sometimes you need to access the computed summary values programmatically, rather than just displaying them in a table. The extract_summary() function retrieves summary data from a gt table as a list of data frames.

gt_table <- exibble |>
  select(group, num, currency) |>
  gt(groupname_col = "group") |>
  summary_rows(
    columns = c(num, currency),
    fns = list("sum", "mean"),
    fmt = ~ fmt_number(., decimals = 2)
  ) |>
  grand_summary_rows(
    columns = c(num, currency),
    fns = list("sum"),
    fmt = ~ fmt_number(., decimals = 2)
  )

extract_summary(gt_table)
$summary_df_data_list
$summary_df_data_list$grp_a
# A tibble: 2 × 7
  group_id row_id rowname group   num currency `::rowname::`
  <chr>    <chr>  <chr>   <dbl> <dbl>    <dbl> <chr>        
1 grp_a    sum    sum        NA  480.   65169. sum          
2 grp_a    mean   mean       NA  120.   16292. mean         

$summary_df_data_list$grp_b
# A tibble: 2 × 7
  group_id row_id rowname group     num currency `::rowname::`
  <chr>    <chr>  <chr>   <dbl>   <dbl>    <dbl> <chr>        
1 grp_b    sum    sum        NA 9662550    1340. sum          
2 grp_b    mean   mean       NA 3220850     447. mean         

$summary_df_data_list$`::GRAND_SUMMARY`
# A tibble: 1 × 7
  group_id        row_id rowname group      num currency `::rowname::`
  <chr>           <chr>  <chr>   <dbl>    <dbl>    <dbl> <chr>        
1 ::GRAND_SUMMARY sum    sum        NA 9663030.   66509. sum          

The returned list contains one data frame per row group (including a "::grand_summary" element for grand summaries if present). Each data frame includes the row names, group information, and the formatted summary values.

This is useful for:

  • Reusing summary calculations in other analyses
  • Creating custom visualizations of summary data
  • Validating that summaries are calculated correctly
  • Building composite tables from multiple sources

7.8 Practical examples

The techniques covered in this chapter come together most clearly in real-world scenarios. The following examples demonstrate how summary rows and columns can be combined to create tables that serve specific professional purposes. Each example illustrates different aspects of the summary system and shows how formatting and styling choices reinforce the table’s communicative goals.

7.8.1 Financial statement with subtotals

Financial statements are perhaps the most natural application for summary rows. Accountants and financial analysts expect to see subtotals for logical groupings (revenue, expenses, assets, liabilities) along with grand totals that tie everything together. The following example creates a simplified income statement that demonstrates these conventions.

dplyr::tibble(
  category = c(
    "Revenue", "Revenue", "Revenue",
    "Expenses", "Expenses", "Expenses", "Expenses"
  ),
  item = c(
    "Product Sales", "Service Revenue", "Licensing",
    "Cost of Goods", "Salaries", "Marketing", "Operations"
  ),
  amount = c(500000, 150000, 75000, -180000, -220000, -45000, -80000)
) |>
  gt(
    rowname_col = "item",
    groupname_col = "category"
  ) |>
  fmt_currency(columns = amount, currency = "USD", accounting = TRUE) |>
  summary_rows(
    fns = list(
      list(id = "subtotal", label = md("*Subtotal*")) ~ sum(.)
    ),
    fmt = ~ fmt_currency(., currency = "USD", accounting = TRUE)
  ) |>
  grand_summary_rows(
    fns = list(
      list(id = "net_income", label = md("**Net Income**")) ~ sum(.)
    ),
    fmt = ~ fmt_currency(., currency = "USD", accounting = TRUE)
  ) |>
  tab_style(
    locations = cells_summary(),
    style = cell_text(style = "italic")
  ) |>
  tab_style(
    locations = cells_grand_summary(),
    style = list(
      cell_fill(color = "lightgray"),
      cell_text(weight = "bold")
    )
  ) |>
  tab_header(
    title = "Income Statement",
    subtitle = "Fiscal Year 2024"
  )
Income Statement
Fiscal Year 2024
amount
Revenue
Product Sales $500,000.00
Service Revenue $150,000.00
Licensing $75,000.00
Subtotal $725,000.00
Expenses
Cost of Goods ($180,000.00)
Salaries ($220,000.00)
Marketing ($45,000.00)
Operations ($80,000.00)
Subtotal ($525,000.00)
Net Income $200,000.00

Notice how accounting notation (parentheses for negative values) is applied consistently to both the data rows and the summary rows. The subtotals use italic styling to distinguish them from line items, while the grand total (Net Income) receives bold text and a background fill to mark it as the bottom line. This visual hierarchy guides readers from detail to summary.

7.8.2 Sales report with multiple statistics

Business reports often need to show both detailed breakdowns and summary statistics. This example combines summary_columns() for row-wise totals and averages with grand_summary_rows() for column-wise company totals. The result is a table that answers multiple questions at once: How did each region perform? What were the quarterly trends? What’s the company-wide picture?

dplyr::tibble(
  region = rep(c("North", "South", "East", "West"), each = 3),
  quarter = rep(c("Q1", "Q2", "Q3"), 4),
  sales = c(
    120, 135, 142,
    98, 105, 118,
    156, 148, 162,
    88, 95, 102
  )
) |>
  tidyr::pivot_wider(names_from = quarter, values_from = sales) |>
  gt(rowname_col = "region") |>
  summary_columns(
    columns = c(Q1, Q2, Q3),
    fns = list(~ sum(.), ~ mean(.)),
    new_col_names = c("total", "avg"),
    new_col_labels = c("Total", "Avg"),
    fmt = ~ fmt_number(., decimals = 0)
  ) |>
  grand_summary_rows(
    columns = c(Q1, Q2, Q3, total, avg),
    fns = list(
      list(id = "company_total", label = "Company Total") ~ sum(.)
    ),
    fmt = ~ fmt_number(., decimals = 0)
  ) |>
  tab_spanner(label = "Quarters", columns = c(Q1, Q2, Q3)) |>
  tab_spanner(label = "Summary", columns = c(total, avg)) |>
  tab_header(title = "Regional Sales Report")
Regional Sales Report
Quarters
Summary
Q1 Q2 Q3 Total Avg
North 120 135 142 397 132
South 98 105 118 321 107
East 156 148 162 466 155
West 88 95 102 285 95
Company Total 462 483 524 1,469 490

The use of column spanners ("Quarters" and "Summary") helps readers understand the table’s structure at a glance. The summary columns on the right provide immediate context for each region’s performance, while the grand summary row at the bottom gives the company-wide view. A manager scanning this table can quickly identify that East is the top-performing region and that sales have been trending upward across quarters.

7.8.3 Scientific data with statistical summaries

Scientific publications typically require summary statistics like means, standard deviations, and sample sizes for experimental data. This example shows how summary_rows() can present multiple statistics for each treatment group, formatted appropriately for each measure.

set.seed(23)

dplyr::tibble(
  treatment = rep(c("Control", "Treatment A", "Treatment B"), each = 4),
  replicate = rep(1:4, 3),
  response = c(
    rnorm(4, mean = 10, sd = 2),
    rnorm(4, mean = 15, sd = 2),
    rnorm(4, mean = 13, sd = 2)
  )
) |>
  gt(
    rowname_col = "replicate",
    groupname_col = "treatment"
  ) |>
  fmt_number(columns = response, decimals = 2) |>
  summary_rows(
    columns = response,
    fns = list(
      list(id = "mean", label = "Mean") ~ mean(.),
      list(id = "sd", label = "SD") ~ sd(.),
      list(id = "n", label = "n") ~ length(.)
    ),
    fmt = list(
      ~ fmt_number(., decimals = 2),
      ~ fmt_number(., decimals = 2),
      ~ fmt_integer(.)
    )
  ) |>
  tab_style(
    locations = cells_summary(),
    style = cell_fill(color = "lightyellow")
  ) |>
  tab_header(
    title = "Experimental Results",
    subtitle = "Response variable by treatment group"
  )
Experimental Results
Response variable by treatment group
response
Control
1 10.39
2 9.13
3 11.83
4 13.59
Mean 11
SD 2
n 4
Treatment A
1 16.99
2 17.21
3 14.44
4 17.04
Mean 16
SD 1
n 4
Treatment B
1 13.09
2 16.15
3 13.44
4 10.91
Mean 13
SD 2
n 4

Each treatment group receives the same three summary statistics, formatted according to their nature: means and standard deviations show two decimal places, while sample sizes appear as integers. The light yellow background on summary rows distinguishes them from the raw data, making it easy for readers to find the statistics they need. This format is common in scientific papers and lab reports, where reviewers expect to see both the underlying data and its statistical summary.

These three examples illustrate the versatility of gt’s summary system. Whether you’re preparing financial reports, business analyses, or scientific publications, the same core functions adapt to meet domain-specific conventions and reader expectations.

7.9 Summary

The summary functions in gt provide a comprehensive system for adding aggregated data to your tables:

  • summary_rows() adds summary rows within each row group, useful for subtotals and group-level statistics
  • grand_summary_rows() adds summary rows that aggregate across the entire table, useful for overall totals and grand statistics
  • summary_columns() adds computed columns that aggregate across rows, useful for row totals and row-wise calculations
  • extract_summary() retrieves summary data for use outside the table

All three summary functions share a consistent interface with the fns argument for specifying aggregations, the fmt argument for formatting results, and the side argument for positioning. Summary elements can be styled and annotated using the standard gt functions with the cells_summary() and cells_grand_summary() location helpers.

By separating the calculation of summaries from their presentation, gt ensures that your tables remain reproducible and maintainable. When data changes, summaries update automatically. When you need different formatting, you can adjust it without recalculating. This approach embodies the gt philosophy: express your intent clearly, and let the package handle the details.