2  Table components

It can be useful to identify the components that make up a table before getting into the nitty gritty of generating tables. Why? This will give us a language to speak about table composition and, in doing so, we’ll discover the merits of the different components, how they work together, and finally understand how to make effective table visualizations when the time comes. Tabular presentation can serve several purposes. We may want to present raw data values in unvarnished form, so that others can perform analyses from that data. We could also prepare a set of summarized results and tabulate that; this provides the reader with the results of an analysis. We could even produce a table with varied information on a particular topic and present it in a visually appealing way. With creativity and an eye to aesthetics, the reader will linger for a bit and explore the presented data, perhaps leaving with heightened understanding of the topic at hand and a feeling of edification.

The gt package can make all of this possible. The general principle is that we combine table components together and refine the presentation bit by bit. We’ve prepared a basic diagram here, showing how the main components of a table (and their subcomponents) fit together:

Here is a listing of the table components (from top to bottom):

Once we have the input data for the table to present, we need to decide which of these components should be used. This chapter will show you how to introduce the input data to gt and how to add the various components together. Generally, the functions that produce or modify these table components will begin with tab_*() and the components will only be displayed when there is content for them (e.g., there won’t be a table footer unless you use a function that adds content to a footer). Now let’s get to making gt tables by starting with the very important, entry-point function: gt().

2.1 Making a gt table: start with gt()

When one provides table data to the gt() function, it generates a gt table object. This function is the initial step in a typical gt workflow. Once you possess the gt table object, you have the ability to perform styling transformations before rendering it as a display table of different formats.

Here is the function’s signature:

gt(
  data,
  rowname_col = NULL,
  groupname_col = dplyr::group_vars(data),
  omit_na_group = FALSE,
  process_md = FALSE,
  caption = NULL,
  rownames_to_stub = FALSE,
  row_group_as_column = FALSE,
  auto_align = TRUE,
  id = NULL,
  locale = getOption("gt.locale"),
  row_group.sep = getOption("gt.row_group.sep", " - ")
)

Let’s use the exibble dataset for the next few examples, we’ll learn how to make simple gt tables with the gt() function. The most basic thing to do is to just use gt() with the dataset as the input.

exibble |> gt()
num char fctr date time datetime currency row group
1.111e-01 apricot one 2015-01-15 13:35 2018-01-01 02:22 49.950 row_1 grp_a
2.222e+00 banana two 2015-02-15 14:40 2018-02-02 14:33 17.950 row_2 grp_a
3.333e+01 coconut three 2015-03-15 15:45 2018-03-03 03:44 1.390 row_3 grp_a
4.444e+02 durian four 2015-04-15 16:50 2018-04-04 15:55 65100.000 row_4 grp_a
5.550e+03 NA five 2015-05-15 17:55 2018-05-05 04:00 1325.810 row_5 grp_b
NA fig six 2015-06-15 NA 2018-06-06 16:11 13.255 row_6 grp_b
7.770e+05 grapefruit seven NA 19:10 2018-07-07 05:22 NA row_7 grp_b
8.880e+06 honeydew eight 2015-08-15 20:20 NA 0.440 row_8 grp_b

From this, we get a very simple table with column labels and all of the body cells below. This is the simplest form of a gt table: it doesn’t restructure the data at all and closely resembles what you’d see when printing a tibble or data frame in the R console. The key difference is that you now have a presentation-ready table that can be rendered in HTML, PDF, or other formats. While this basic output is functional, the gt() function offers several arguments that let you add structure and context to your table right from the start.

This dataset has the row and group columns. The former contains unique values that are ideal for labeling rows, and this often happens in what is called the ‘stub’ (a reserved area that serves to label rows). With the gt() function, we can immediately place the contents of the row column into the stub column. To do this, we use the rowname_col argument with the name of the column to use in quotes.

exibble |> gt(rowname_col = "row")
num char fctr date time datetime currency group
row_1 1.111e-01 apricot one 2015-01-15 13:35 2018-01-01 02:22 49.950 grp_a
row_2 2.222e+00 banana two 2015-02-15 14:40 2018-02-02 14:33 17.950 grp_a
row_3 3.333e+01 coconut three 2015-03-15 15:45 2018-03-03 03:44 1.390 grp_a
row_4 4.444e+02 durian four 2015-04-15 16:50 2018-04-04 15:55 65100.000 grp_a
row_5 5.550e+03 NA five 2015-05-15 17:55 2018-05-05 04:00 1325.810 grp_b
row_6 NA fig six 2015-06-15 NA 2018-06-06 16:11 13.255 grp_b
row_7 7.770e+05 grapefruit seven NA 19:10 2018-07-07 05:22 NA grp_b
row_8 8.880e+06 honeydew eight 2015-08-15 20:20 NA 0.440 grp_b

This sets up a table with a stub, the row labels are placed within the stub column, and a vertical dividing line has been placed on the right-hand side.

The group column can be used to divide the rows into discrete groups. Within that column, we see repetitions of the values "grp_a" and "grp_b". These serve both as ID values and the initial label for the groups. With the groupname_col argument in gt(), we can set up the row groups immediately upon creation of the table.

exibble |>
  gt(
    rowname_col = "row",
    groupname_col = "group"
  )
num char fctr date time datetime currency
grp_a
row_1 1.111e-01 apricot one 2015-01-15 13:35 2018-01-01 02:22 49.950
row_2 2.222e+00 banana two 2015-02-15 14:40 2018-02-02 14:33 17.950
row_3 3.333e+01 coconut three 2015-03-15 15:45 2018-03-03 03:44 1.390
row_4 4.444e+02 durian four 2015-04-15 16:50 2018-04-04 15:55 65100.000
grp_b
row_5 5.550e+03 NA five 2015-05-15 17:55 2018-05-05 04:00 1325.810
row_6 NA fig six 2015-06-15 NA 2018-06-06 16:11 13.255
row_7 7.770e+05 grapefruit seven NA 19:10 2018-07-07 05:22 NA
row_8 8.880e+06 honeydew eight 2015-08-15 20:20 NA 0.440

If you’d rather perform the set up of row groups later (i.e., not in the gt() call), this is possible with use of the tab_row_group() function (and row_group_order() can help with the arrangement of row groups).

One more thing to consider with row groups is their layout. By default, row group labels reside in separate rows the appear above the group. However, we can use the row_group_as_column = TRUE option to put the row group labels within a secondary column within the table stub.

exibble |>
  gt(
    rowname_col = "row",
    groupname_col = "group",
    row_group_as_column = TRUE
  )
num char fctr date time datetime currency
grp_a row_1 1.111e-01 apricot one 2015-01-15 13:35 2018-01-01 02:22 49.950
row_2 2.222e+00 banana two 2015-02-15 14:40 2018-02-02 14:33 17.950
row_3 3.333e+01 coconut three 2015-03-15 15:45 2018-03-03 03:44 1.390
row_4 4.444e+02 durian four 2015-04-15 16:50 2018-04-04 15:55 65100.000
grp_b row_5 5.550e+03 NA five 2015-05-15 17:55 2018-05-05 04:00 1325.810
row_6 NA fig six 2015-06-15 NA 2018-06-06 16:11 13.255
row_7 7.770e+05 grapefruit seven NA 19:10 2018-07-07 05:22 NA
row_8 8.880e+06 honeydew eight 2015-08-15 20:20 NA 0.440

This could be done later if need be, and using tab_options(row_group.as_column = TRUE) would be the way to do it outside of the gt() call.

2.1.1 Multi-column stubs for hierarchical row labels

When your data has natural hierarchical structure, you can create a multi-column stub by passing a vector of column names to rowname_col. This feature is particularly useful for financial reports with account hierarchies, clinical trial tables with multiple levels of categorization, or any situation where rows have parent-child relationships.

Let’s create a dataset with a two-level hierarchy (region and category) and display it with a multi-column stub:

sales_data <- dplyr::tibble(
  region = c("North", "North", "North", "South", "South", "South"),
  category = c("Electronics", "Clothing", "Food", "Electronics", "Clothing", "Food"),
  Q1 = c(45000, 32000, 28000, 38000, 41000, 35000),
  Q2 = c(48000, 35000, 30000, 42000, 39000, 37000),
  Q3 = c(52000, 38000, 32000, 45000, 43000, 39000),
  Q4 = c(58000, 42000, 35000, 48000, 46000, 41000)
)

sales_data |>
  gt(rowname_col = c("region", "category")) |>
  tab_header(
    title = "Quarterly Sales by Region and Category",
    subtitle = "All values in USD"
  ) |>
  fmt_currency(columns = everything(), currency = "USD", decimals = 0) |>
  tab_stubhead(label = c("Region", "Category")) |>
  tab_style(
    style = cell_fill(color = "gray95"),
    locations = cells_stub()
  )
Quarterly Sales by Region and Category
All values in USD
Region Category Q1 Q2 Q3 Q4
North Electronics $45,000 $48,000 $52,000 $58,000
Clothing $32,000 $35,000 $38,000 $42,000
Food $28,000 $30,000 $32,000 $35,000
South Electronics $38,000 $42,000 $45,000 $48,000
Clothing $41,000 $39,000 $43,000 $46,000
Food $35,000 $37,000 $39,000 $41,000

The multi-column stub creates a clean visual hierarchy. Notice that repeating values in the first stub column (the region) are automatically consolidated, making it clear which categories belong to each region. The tab_stubhead() function also accepts a vector of labels, one for each level of the hierarchy.

This feature works seamlessly with formatting and styling functions. The stub columns are treated as a unit, allowing you to apply styles to the entire stub area while still being able to reference individual stub columns when needed.

2.1.2 Additional gt() options

Some datasets have rownames built in (mtcars famously has the car model names as the rownames). To use those rownames as row labels in the stub, the rownames_to_stub = TRUE option will prove to be useful.

head(mtcars, 10) |> gt(rownames_to_stub = TRUE)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

By default, values in the body of a gt table (and their column labels) are automatically aligned. The alignment is governed by the types of values in a column. If you’d like to disable this form of auto-alignment, the auto_align = FALSE option can be taken.

exibble |> gt(rowname_col = "row", auto_align = FALSE)
num char fctr date time datetime currency group
row_1 1.111e-01 apricot one 2015-01-15 13:35 2018-01-01 02:22 49.950 grp_a
row_2 2.222e+00 banana two 2015-02-15 14:40 2018-02-02 14:33 17.950 grp_a
row_3 3.333e+01 coconut three 2015-03-15 15:45 2018-03-03 03:44 1.390 grp_a
row_4 4.444e+02 durian four 2015-04-15 16:50 2018-04-04 15:55 65100.000 grp_a
row_5 5.550e+03 NA five 2015-05-15 17:55 2018-05-05 04:00 1325.810 grp_b
row_6 NA fig six 2015-06-15 NA 2018-06-06 16:11 13.255 grp_b
row_7 7.770e+05 grapefruit seven NA 19:10 2018-07-07 05:22 NA grp_b
row_8 8.880e+06 honeydew eight 2015-08-15 20:20 NA 0.440 grp_b

What you’ll get from that is center-alignment of all table body values and all column labels. Note that row labels in the the stub are still left-aligned (though it’s hard to see that in the previous example); auto_align has no effect on alignment within the table stub. It’s generally not recommended to use auto_align = FALSE since the the automatic alignment choices are quite reasonable for most tables.

However which way you generate the initial gt table object, you can use it with a huge variety of functions in the package to further customize the presentation. Formatting body cells is commonly done with the family of formatting functions (e.g., fmt_number(), fmt_date(), etc.). The package supports formatting with internationalization (‘i18n’ features) and so locale-aware functions come with a locale argument. To avoid having to use that argument repeatedly, the gt() function has its own locale argument. Setting a locale in that will make it available globally. Here’s an example of how that works in practice when setting locale = "fr" in gt() and using formatting functions:

exibble |>
  gt(
    rowname_col = "row",
    groupname_col = "group",
    locale = "fr"
  ) |>
  fmt_number() |>
  fmt_date(
    columns = date,
    date_style = "yMEd"
  ) |>
  fmt_datetime(
    columns = datetime,
    format = "EEEE, MMMM d, y",
    locale = "en"
  )
num char fctr date time datetime currency
grp_a
row_1 0,11 apricot one jeu. 15/01/2015 13:35 Monday, January 1, 2018 49,95
row_2 2,22 banana two dim. 15/02/2015 14:40 Friday, February 2, 2018 17,95
row_3 33,33 coconut three dim. 15/03/2015 15:45 Saturday, March 3, 2018 1,39
row_4 444,40 durian four mer. 15/04/2015 16:50 Wednesday, April 4, 2018 65 100,00
grp_b
row_5 5 550,00 NA five ven. 15/05/2015 17:55 Saturday, May 5, 2018 1 325,81
row_6 NA fig six lun. 15/06/2015 NA Wednesday, June 6, 2018 13,26
row_7 777 000,00 grapefruit seven NA 19:10 Saturday, July 7, 2018 NA
row_8 8 880 000,00 honeydew eight sam. 15/08/2015 20:20 NA 0,44

In this example, the fmt_number() and fmt_date() functions understand that the locale for this table is "fr" (French), so the appropriate formatting for that locale is apparent in the num, currency, and date columns. However in the fmt_datetime() call, we explicitly use the "en" (English) locale. This overrides the "fr" default set for this table and the end result is dates formatted with the English locale in the datetime column.

The process_md argument controls whether the contents of rowname_col and groupname_col should be interpreted as Markdown. By default (FALSE), the text appears literally. When set to TRUE, gt will render Markdown syntax in your row labels and row group labels. This is useful when your stub or grouping data contains formatted text.

dplyr::tibble(
  item = c("**Premium** Widget", "*Standard* Widget", "Basic Widget"),
  category = c("**Featured**", "**Featured**", "Regular"),
  price = c(99.99, 49.99, 19.99)
) |>
  gt(rowname_col = "item", groupname_col = "category", process_md = TRUE) |>

  tab_header(title = "Product Catalog") |>
  fmt_currency(columns = price)
Product Catalog
price
Featured
Premium Widget $99.99
Standard Widget $49.99
Regular
Basic Widget $19.99

Without process_md = TRUE, you would see the literal Markdown syntax (e.g., **Premium** Widget instead of Premium Widget) in the stub and row group labels. Note that process_md specifically affects the rowname_col and groupname_col content; for Markdown in other parts of the table (like column labels or headers), use the md() helper function. If you need Markdown text in the table body cells to be rendered, use the fmt_markdown() formatting function, which is covered in Chapter 4.

2.4 Grouping together column labels with spanners

Column spanners are horizontal labels that stretch across multiple columns, grouping them under a common heading. They create visual hierarchy in the boxhead (the part of the table containing column labels) and help readers understand relationships between columns. For example, columns showing different years of population data might be grouped under a "Population" spanner, while density columns might share a "Density" spanner.

The part of the table that contains, at a minimum, column labels and, optionally, spanner labels is sometimes called the table boxhead. A spanner will occupy space over any number of contiguous column labels. With the tab_spanner() function, you can insert a spanner in the boxhead part of a gt table. This function allows for mapping to be defined by column names, existing spanner ID values, or a mixture of both.

2.4.1 tab_spanner()

With the tab_spanner() function, you can insert a spanner above column labels or existing spanners in the boxhead part of a gt table.

Here is the function’s signature:

tab_spanner(
  data,
  label,
  columns = NULL,
  spanners = NULL,
  level = NULL,
  id = label,
  gather = TRUE,
  replace = FALSE
)

The spanners are placed in the order of calling tab_spanner() so if a later call uses the same columns in its definition (or even a subset) as the first invocation, the second spanner will be overlaid atop the first. Options exist for forcibly inserting a spanner underneath other (with level as space permits) and with replace, which allows for full or partial spanner replacement.

Let’s create a gt table using a small portion of the gtcars dataset. Over several columns (hp, hp_rpm, trq, trq_rpm, mpg_c, mpg_h) we’ll use tab_spanner() to add a spanner with the label "performance". This effectively groups together several columns related to car performance under a unifying label.

gtcars |>
  dplyr::select(
    year, model, bdy_style, starts_with(c("hp", "trq", "mpg")), msrp
  ) |>
  dplyr::slice(1:8) |>
  gt(rowname_col = "model") |>
  tab_spanner(
    label = "performance",
    columns = starts_with(c("hp", "trq", "mpg"))
  )
year bdy_style
performance
msrp
hp hp_rpm trq trq_rpm mpg_c mpg_h
GT 2017 coupe 647 6250 550 5900 11 18 447000
458 Speciale 2015 coupe 597 9000 398 6000 13 17 291744
458 Spider 2015 convertible 562 9000 398 6000 13 17 263553
458 Italia 2014 coupe 562 9000 398 6000 13 17 233509
488 GTB 2016 coupe 661 8000 561 3000 15 22 245400
California 2015 convertible 553 7500 557 4750 16 23 198973
GTC4Lusso 2017 coupe 680 8250 514 5750 12 17 298000
FF 2015 coupe 652 8000 504 6000 11 16 295000

Notice that in the above table code, we used the starts_with() selection helper in both the dplyr select() statement and in the gt tab_spanner() statement. Such use of tidyselect selection helpers is incredibly helpful for shortening the amount of code supplied in the columns argument across many gt functions.

With the default gather = TRUE option, columns selected for a particular spanner will be moved so that there is no separation between them. This can be seen with the example below that uses a subset of the towny dataset. The starting column order is name, latitude, longitude, population_2016, density_2016, population_2021, and density_2021. The first two uses of tab_spanner() deal with making separate spanners for the two population and two density columns. After their use, the columns are moved to this new ordering: name, latitude, longitude, population_2016, population_2021, density_2016, and density_2021. The third and final call of tab_spanner() doesn’t further affect the ordering of columns.

towny |>
  dplyr::arrange(desc(population_2021)) |>
  dplyr::slice_head(n = 5) |>
  dplyr::select(
    name, latitude, longitude,
    ends_with("2016"), ends_with("2021")
  ) |>
  gt() |>
  tab_spanner(
    columns = starts_with("pop"),
    label = "Population"
  ) |>
  tab_spanner(
    columns = starts_with("den"),
    label = "Density"
  ) |>
  tab_spanner(
    columns = ends_with("itude"),
    label = md("*Location*"),
    id = "loc"
  )
name
Location
Population
Density
latitude longitude population_2016 population_2021 density_2016 density_2021
Toronto 43.74167 -79.37333 2731571 2794356 4328.27 4427.75
Ottawa 45.42472 -75.69500 934243 1017449 335.07 364.91
Mississauga 43.60000 -79.65000 721599 717961 2464.98 2452.56
Brampton 43.68833 -79.76083 593638 656480 2232.65 2468.99
Hamilton 43.25667 -79.86917 536917 569353 480.11 509.12

While columns are moved, it is only the minimal amount of moving required (pulling in columns from the right) to ensure that columns are gathered under the appropriate spanners. With the last call, there are two more things to note: (1) label values can use the md() (or html()) helper functions to help create styled text, and (2) an id value may be supplied for reference later (e.g., for styling with tab_style() or applying footnotes with tab_footnote()).

It’s possible to stack multiple spanners atop each other with consecutive calls of tab_spanner(). It’s a bit like playing Tetris: putting a spanner down anywhere there is another spanner (i.e., there are one or more shared columns) means that second spanner will reside a level above prior. Let’s look at a few examples at how this works, and we’ll also explore a few lesser-known placement tricks. Let’s use a cut down version of exibble for this, set up a few level-one spanners, and then place a level two spanner over two other spanners.

exibble_narrow <- exibble |> dplyr::slice_head(n = 3)

exibble_narrow |>
  gt() |>
  tab_spanner(
    label = "Row Information",
    columns = c(row, group)
  ) |>
  tab_spanner(
    label = "Numeric Values",
    columns = where(is.numeric),
    id = "num_spanner"
  ) |>
  tab_spanner(
    label = "Text Values",
    columns = c(char, fctr),
    id = "text_spanner"
  ) |>
  tab_spanner(
    label = "Numbers and Text",
    spanners = c("num_spanner", "text_spanner")
  )
Numbers and Text
Numeric Values
Text Values
date time datetime
Row Information
num currency char fctr row group
0.1111 49.95 apricot one 2015-01-15 13:35 2018-01-01 02:22 row_1 grp_a
2.2220 17.95 banana two 2015-02-15 14:40 2018-02-02 14:33 row_2 grp_a
33.3300 1.39 coconut three 2015-03-15 15:45 2018-03-03 03:44 row_3 grp_a

In the above example, we used the spanners argument to define where the "Numbers and Text"-labeled spanner should reside. For that, we supplied the "num_spanner" and "text_spanner" ID values for the two spanners associated with the num, currency, char, and fctr columns. Alternatively, we could have given those column names to the columns argument and achieved the same result. You could actually use a combination of spanners and columns to define where the spanner should be placed. Here is an example of just that:

exibble_narrow_gt <-
  exibble_narrow |>
  gt() |>
  tab_spanner(
    label = "Numeric Values",
    columns = where(is.numeric),
    id = "num_spanner"
  ) |>
  tab_spanner(
    label = "Text Values",
    columns = c(char, fctr),
    id = "text_spanner"
  ) |>
  tab_spanner(
    label = "Text, Dates, Times, Datetimes",
    columns = contains(c("date", "time")),
    spanners = "text_spanner"
  )
  
exibble_narrow_gt
Text, Dates, Times, Datetimes
Numeric Values
Text Values
date time datetime row group
num currency char fctr
0.1111 49.95 apricot one 2015-01-15 13:35 2018-01-01 02:22 row_1 grp_a
2.2220 17.95 banana two 2015-02-15 14:40 2018-02-02 14:33 row_2 grp_a
33.3300 1.39 coconut three 2015-03-15 15:45 2018-03-03 03:44 row_3 grp_a

And, again, we could have solely supplied all of the column names to columns instead of using this hybrid approach, but it is interesting to express the definition of spanners with this flexible combination. What if you wanted to extend the above example and place a spanner above the date, time, and datetime columns? If you tried that in the manner as exemplified above, the spanner will be placed in the third level of spanners:

exibble_narrow_gt |>
  tab_spanner(
    label = "Date and Time Columns",
    columns = contains(c("date", "time")),
    id = "date_time_spanner"
  )
Date and Time Columns
Text, Dates, Times, Datetimes
Numeric Values
Text Values
date time datetime row group
num currency char fctr
0.1111 49.95 apricot one 2015-01-15 13:35 2018-01-01 02:22 row_1 grp_a
2.2220 17.95 banana two 2015-02-15 14:40 2018-02-02 14:33 row_2 grp_a
33.3300 1.39 coconut three 2015-03-15 15:45 2018-03-03 03:44 row_3 grp_a

Remember that the approach taken by tab_spanner() is to keep stacking atop existing spanners. But, there is space next to the "Text Values" spanner on the first level. You can either revise the order of tab_spanner() calls, or, use the level argument to force the spanner into that level (so long as there is space).

exibble_narrow_gt |>
  tab_spanner(
    label = "Date and Time Columns",
    columns = contains(c("date", "time")),
    level = 1,
    id = "date_time_spanner"
  )
Text, Dates, Times, Datetimes
Numeric Values
Text Values
Date and Time Columns
row group
num currency char fctr date datetime time
0.1111 49.95 apricot one 2015-01-15 2018-01-01 02:22 13:35 row_1 grp_a
2.2220 17.95 banana two 2015-02-15 2018-02-02 14:33 14:40 row_2 grp_a
33.3300 1.39 coconut three 2015-03-15 2018-03-03 03:44 15:45 row_3 grp_a

That puts the spanner in the intended level. If there aren’t free locations available in the level specified you’ll get an error stating which columns cannot be used for the new spanner (this can be circumvented, if necessary, with the replace = TRUE option). If you choose a level higher than the maximum occupied, then the spanner will be dropped down. Again, these behaviors are indicative of Tetris-like rules though they tend to work well for the application of spanners.

2.4.2 tab_spanner_delim()

The cols_spanner_delim() function can take specially-crafted column names and generate one or more spanner column labels (along with relabeling the column labels).

Here is the function’s signature:

tab_spanner_delim(
  data,
  delim,
  columns = everything(),
  split = c("last", "first"),
  limit = NULL,
  reverse = FALSE
)

This is done by splitting the column name by a specified delimiter character (this is the delim) and placing the fragments from top to bottom (i.e., higher-level spanners to the column labels). Furthermore, the neighboring text fragments on different spanner levels will be coalesced together to put the span back into spanner. For instance, having the three side-by-side column names rating_1, rating_2, and rating_3 will (in the default case at least) result in a spanner with the label "rating" above columns with the labels "1", "2", and "3".

If we take a hypothetical table that includes the column names province.NL_ZH.pop, province.NL_ZH.gdp, province.NL_NH.pop, and province.NL_NH.gdp, we can see that we have a naming system that has a well-defined structure. We start with the more general to the left ("province") and move to the more specific on the right ("pop"). If the columns are in the table in this exact order, then things are in an ideal state as the eventual spanner column labels will form from this neighboring. When using tab_spanner_delim() here with delim set as “.” we get the following text fragments:

province.NL_ZH.pop -> "province", "NL_ZH", "pop"

province.NL_ZH.gdp -> "province", "NL_ZH", "gdp"

province.NL_NH.pop -> "province", "NL_NH", "pop"

province.NL_NH.gdp -> "province", "NL_NH", "gdp"

This gives us the following arrangement of column labels and spanner labels:

--------- `"province"` ---------- <- level 2 spanner
---`"NL_ZH"`--- | ---`"NL_NH"`--- <- level 1 spanners
`"pop"`|`"gdp"` | `"pop"`|`"gdp"` <- column labels
---------------------------------

There might be situations where the same delimiter is used throughout but only the last instance requires a splitting. With a pair of column names like north_holland_pop and north_holland_area you would only want "pop" and "area" to be column labels underneath a single spanner ("north_holland"). To achieve this, the split and limit arguments are used and the values for each need to be split = "last" and limit = 1. This will give us the following arrangement:

--`"north_holland"`-- <- level 1 spanner
 `"pop"`  |  `"area"` <- column labels
---------------------

With a subset of the towny dataset, we can create a gt table and then use the tab_spanner_delim() function to automatically generate column spanner labels. In this case we have some column names in the form population_<year>. The underscore character is the delimiter that separates a common word "population" and a year value. In this default way of splitting, fragments to the right are lowest (really they become new column labels) and moving left we get spanners. Let’s have a look at how tab_spanner_delim() handles these column names:

towny_subset_gt <-
  towny |>
  dplyr::select(name, starts_with("population")) |>
  dplyr::filter(grepl("^F", name)) |>
  gt() |>
  tab_spanner_delim(delim = "_") |>
  fmt_integer()

towny_subset_gt
name
population
1996 2001 2006 2011 2016 2021
Faraday 1,638 1,581 1,578 1,468 1,401 1,612
Fauquier-Strickland 684 678 568 530 536 467
Fort Erie 27,183 28,143 29,925 29,960 30,710 32,901
Fort Frances 8,790 8,315 8,103 7,952 7,739 7,466
French River 2,847 2,810 2,659 2,442 2,662 2,828
Front of Yonge 2,530 2,639 2,803 2,752 2,602 2,595
Frontenac Islands 1,661 1,638 1,862 1,864 1,760 1,930

The spanner created through this use of tab_spanner_delim() is automatically given an ID value by gt. Because it’s hard to know what the ID value is, we can use tab_info() to inspect the table’s indices and ID values.

towny_subset_gt |> tab_info()
Information on ID and Label Values
ID Idx
Lvl
Label
Columns
name 1 name
population_1996 2 1996
population_2001 3 2001
population_2006 4 2006
population_2011 5 2011
population_2016 6 2016
population_2021 7 2021
Rows
<< Index values 1 to 7 >>

Spanners
spanner-population_1996 1 population

From this informational table, we see that the ID for the spanner is "spanner-population_1996". Also, the columns are still accessible by the original column names (tab_spanner_delim() did change their labels though). Let’s use tab_style() to add some styles to the towny_subset_gt table.

towny |>
  dplyr::select(name, starts_with("population")) |>
  dplyr::filter(grepl("^F", name)) |>
  gt() |>
  tab_spanner_delim(delim = "_") |>
  fmt_integer() |>
  tab_style(
    style = cell_fill(color = "aquamarine"),
    locations = cells_body(columns = population_2021)
  ) |>
  tab_style(
    style = cell_text(transform = "capitalize"),
    locations = cells_column_spanners(spanners = "spanner-population_1996")
  )
name
population
1996 2001 2006 2011 2016 2021
Faraday 1,638 1,581 1,578 1,468 1,401 1,612
Fauquier-Strickland 684 678 568 530 536 467
Fort Erie 27,183 28,143 29,925 29,960 30,710 32,901
Fort Frances 8,790 8,315 8,103 7,952 7,739 7,466
French River 2,847 2,810 2,659 2,442 2,662 2,828
Front of Yonge 2,530 2,639 2,803 2,752 2,602 2,595
Frontenac Islands 1,661 1,638 1,862 1,864 1,760 1,930

We can plan ahead a bit and refashion the column names with dplyr before introducing the table to gt() and tab_spanner_delim(). Here the column labels have underscore delimiters where splitting is not wanted (so a period or space character is used instead). The usage of tab_spanner_delim() gives two levels of spanners. We can further touch up the labels after that with cols_label_with() and text_transform().

towny |>
  dplyr::arrange(desc(population_2021)) |>
  dplyr::slice_head(n = 5) |>
  dplyr::select(name, ends_with("pct")) |>
  dplyr::rename_with(
    .fn = function(x) {
      x |>
        gsub("(.*?)_(\\d{4})", "\\1.\\2", x = _) |>
        gsub("pop_change", "Population Change", x = _)
    }
  ) |>
  gt(rowname_col = "name") |>
  tab_spanner_delim(delim = "_") |>
  fmt_number(decimals = 1, scale_by = 100) |>
  cols_label_with(
    fn = function(x) gsub("pct", "%", x)
  ) |>
  text_transform(
    fn = function(x) gsub("\\.", " - ", x),
    locations = cells_column_spanners()
  ) |>
  tab_style(
    style = cell_text(align = "center"),
    locations = cells_column_labels()
  ) |>
  tab_style(
    style = "padding-right: 36px;",
    locations = cells_body()
  )
Population Change
1996 - 2001
2001 - 2006
2006 - 2011
2011 - 2016
2016 - 2021
% % % % %
Toronto 4.0 0.9 4.5 4.5 2.3
Ottawa 7.3 4.9 8.8 5.8 8.9
Mississauga 12.6 9.1 6.7 1.1 −0.5
Brampton 21.3 33.3 20.8 13.3 10.6
Hamilton 4.8 2.9 3.0 3.3 6.0

With a summarized, filtered, and pivoted version of the pizzaplace dataset, we can create another gt table and then use the tab_spanner_delim() function with the same delimiter/separator that was used in the tidyr pivot_wider() call. We can also process the generated column labels with cols_label_with().

pizzaplace |>
  dplyr::select(name, date, type, price) |>
  dplyr::group_by(name, date, type) |>
  dplyr::summarize(revenue = sum(price), sold = n(), .groups = "drop") |>
  dplyr::filter(date %in% c("2015-01-01", "2015-01-02", "2015-01-03")) |>
  dplyr::filter(type %in% c("classic", "veggie")) |>
  tidyr::pivot_wider(
    names_from = date,
    names_sep = ".",
    values_from = c(revenue, sold),
    values_fn = sum,
    names_sort = TRUE
  ) |>
  gt(rowname_col = "name", groupname_col = "type") |>
  tab_spanner_delim(delim = ".") |>
  sub_missing(missing_text = "") |>
  fmt_currency(columns = starts_with("revenue")) |>
  data_color(
    columns = starts_with("revenue"),
    method = "numeric",
    palette = c("white", "lightgreen")
  ) |>
  cols_label_with(
    fn = function(x) {
      paste0(x, " (", vec_fmt_datetime(x, format = "E"), ")")
    }
  )
revenue
sold
2015-01-01 (Thu) 2015-01-02 (Fri) 2015-01-03 (Sat) 2015-01-01 (Thu) 2015-01-02 (Fri) 2015-01-03 (Sat)
classic
big_meat $60.00 $96.00 $96.00 5 8 8
classic_dlx $156.50 $93.00 $80.50 10 6 5
hawaiian $50.75 $137.75 $113.50 4 10 8
ital_cpcllo $150.50 $121.50 $48.50 8 7 3
napolitana $32.50 $106.00 $92.50 2 6 6
pep_msh_pep $82.50 $11.00 $22.00 6 1 2
pepperoni $77.75 $150.00 $102.75 6 12 8
the_greek $73.50 $113.50 $146.00 5 7 7
veggie
five_cheese $129.50 $111.00 $74.00 7 6 4
four_cheese $98.10 $50.65 $119.25 6 3 7
green_garden $96.25 $12.00 $16.00 7 1 1
ital_veggie $25.50 $63.25 $58.75 2 4 3
mediterraneo $52.25 $20.25 $48.25 3 1 3
mexicana $165.50 $92.75 $72.50 9 5 4
spin_pesto $33.25 $33.25 $103.75 2 2 6
spinach_fet $40.50 $64.50 $36.25 2 4 2
veggie_veg $44.25 $97.00 $116.50 3 5 7

This example demonstrates a sophisticated workflow combining pivoting, delimiter-based spanners, and dynamic label generation. The pivot_wider() creates columns like revenue.2015-01-01 and sold.2015-01-01, which tab_spanner_delim() splits into spanners (revenue, sold) and column labels (the dates). The cols_label_with() function then appends the day of the week to each date label, producing labels like "2015-01-01 (Thu)". The data_color() call adds a subtle green gradient to revenue cells, making it easy to spot higher-performing days at a glance.

2.5 The stub and row groups

The stub is a special column (or set of columns) on the left side of the table that holds row labels. When present, the stub serves as an identifier for each row, similar to how column labels identify columns. Row groups take this organization further by dividing rows into named sections, each with its own header row. Together, the stub and row groups create vertical structure in a table, making it easier to navigate and understand large datasets.

The stub is created when you designate a column for row labels using rowname_col in gt(). Once a stub exists, you can add a stubhead label (a header for the stub column itself) and organize rows into groups. Row groups appear as labeled sections that visually separate different categories of data.

2.5.1 tab_row_group()

Create a row group with a collection of rows. This requires specification of the rows to be included, either by supplying row labels, row indices, or through use of a select helper function like starts_with().

Here is the function’s signature:

tab_row_group(
  data,
  label,
  rows,
  id = label
)

To set a default row group label for any rows not formally placed in a row group, we can use a separate call to tab_options(row_group.default_label = <label>). If this is not done and there are rows that haven’t been placed into a row group (where one or more row groups already exist), those rows will be automatically placed into a row group without a label. To restore labels for row groups not explicitly assigned a group, tab_options(row_group.default_label = "") can be used.

Using a subset of the gtcars dataset, let’s create a simple gt table with row labels (from the model column) inside of a stub. This eight-row table begins with no row groups at all but with a single use of the tab_row_group() function, we can specify a row group that will contain any rows where the car model begins with a number.

gtcars |>
  dplyr::select(model, year, hp, trq) |>
  dplyr::slice(1:8) |>
  gt(rowname_col = "model") |>
  tab_row_group(
    label = "numbered",
    rows = matches("^[0-9]")
  )
year hp trq
numbered
458 Speciale 2015 597 398
458 Spider 2015 562 398
458 Italia 2014 562 398
488 GTB 2016 661 561
GT 2017 647 550
California 2015 553 557
GTC4Lusso 2017 680 514
FF 2015 652 504

This actually makes two row groups since there are row labels that don’t begin with a number. That second row group is a catch-all NA group, and it doesn’t display a label at all. Rather, it is set off from the other group with a double line. This may be a preferable way to display the arrangement of one distinct group and an ‘others’ or default group. If that’s the case but you’d like the order reversed, the row_group_order() function can be used for that.

gtcars |>
  dplyr::select(model, year, hp, trq) |>
  dplyr::slice(1:8) |>
  gt(rowname_col = "model") |>
  tab_row_group(
    label = "numbered",
    rows = matches("^[0-9]")
  ) |>
  row_group_order(groups = c(NA, "numbered"))
year hp trq
GT 2017 647 550
California 2015 553 557
GTC4Lusso 2017 680 514
FF 2015 652 504
numbered
458 Speciale 2015 597 398
458 Spider 2015 562 398
458 Italia 2014 562 398
488 GTB 2016 661 561

Two more options include: (1) setting a default label for the ‘others’ group (done through tab_options()], and (2) creating row groups until there are no more unaccounted for rows. Let’s try the first option in the next example:

gtcars |>
  dplyr::select(model, year, hp, trq) |>
  dplyr::slice(1:8) |>
  gt(rowname_col = "model") |>
  tab_row_group(
    label = "numbered",
    rows = matches("^[0-9]")
  ) |>
  row_group_order(groups = c(NA, "numbered")) |>
  tab_options(row_group.default_label = "others")
year hp trq
others
GT 2017 647 550
California 2015 553 557
GTC4Lusso 2017 680 514
FF 2015 652 504
numbered
458 Speciale 2015 597 398
458 Spider 2015 562 398
458 Italia 2014 562 398
488 GTB 2016 661 561

The above use of the row_group.default_label in tab_options() gets the job done and provides a default label. One drawback is that the default/NA group doesn’t have an ID, so it can’t as easily be styled with tab_style(); however, row groups have indices and the index for the "others" group here is 1.

gtcars |>
  dplyr::select(model, year, hp, trq) |>
  dplyr::slice(1:8) |>
  gt(rowname_col = "model") |>
  tab_row_group(
    label = "numbered",
    rows = matches("^[0-9]")
  ) |>
  row_group_order(groups = c(NA, "numbered")) |>
  tab_options(row_group.default_label = "others") |>
  tab_style(
    style = cell_fill(color = "bisque"),
    locations = cells_row_groups(groups = 1)
  ) |>
  tab_style(
    style = cell_fill(color = "lightgreen"),
    locations = cells_row_groups(groups = "numbered")
  )
year hp trq
others
GT 2017 647 550
California 2015 553 557
GTC4Lusso 2017 680 514
FF 2015 652 504
numbered
458 Speciale 2015 597 398
458 Spider 2015 562 398
458 Italia 2014 562 398
488 GTB 2016 661 561

Another way to handle rows with NA values in the grouping column is through the omit_na_group argument in gt(). By default (FALSE), rows with NA in the groupname_col are assigned to a group labeled "NA". Setting omit_na_group = TRUE causes those rows to appear as ungrouped rows in the table body instead. This is useful when you want certain rows to stand apart from any row group, perhaps as header rows or separators.

Let’s see how this works. First, we’ll create a dataset where some rows have NA for the group column:

data_with_na_group <- 
  dplyr::tibble(
    item = c("Category A Items", "Widget", "Gadget", "Category B Items", "Sprocket", "Cog"),
    group = c(NA, "A", "A", NA, "B", "B"),
    value = c(NA, 100, 150, NA, 200, 175)
  )

data_with_na_group |>
  gt(rowname_col = "item", groupname_col = "group")
value
NA
Category A Items NA
Category B Items NA
A
Widget 100
Gadget 150
B
Sprocket 200
Cog 175

With the default behavior, the rows with NA in the group column are placed in an "NA" group. Now let’s use omit_na_group = TRUE to have those rows appear outside of any group:

data_with_na_group |>
  gt(
    rowname_col = "item",
    groupname_col = "group",
    omit_na_group = TRUE
  )
value
Category A Items NA
Category B Items NA
A
Widget 100
Gadget 150
B
Sprocket 200
Cog 175

The rows that had NA for their group now appear as ungrouped rows, visually distinct from the grouped content. This pattern is particularly useful when you want to include descriptive header rows or section dividers that logically shouldn’t belong to any data group.

Now let’s try using tab_row_group() with our gtcars-based table such that all rows are formally assigned to different row groups. We’ll define two row groups with the (Markdown-infused) labels "**Powerful Cars**" and "**Super Powerful Cars**". The distinction between the groups is whether hp is lesser or greater than 600 (and this is governed by the expressions provided to the rows argument).

gtcars |>
  dplyr::select(model, year, hp, trq) |>
  dplyr::slice(1:8) |>
  gt(rowname_col = "model") |>
  tab_row_group(
    label = md("**Powerful Cars**"),
    rows = hp < 600,
    id = "powerful"
  ) |>
  tab_row_group(
    label = md("**Super Powerful Cars**"),
    rows = hp >= 600,
    id = "v_powerful"
  ) |>
  tab_style(
    style = cell_fill(color = "gray85"),
    locations = cells_row_groups(groups = "powerful")
  ) |>
  tab_style(
    style = list(
      cell_fill(color = "gray95"),
      cell_text(size = "larger")
    ),
    locations = cells_row_groups(groups = "v_powerful")
  )
year hp trq
Super Powerful Cars
GT 2017 647 550
488 GTB 2016 661 561
GTC4Lusso 2017 680 514
FF 2015 652 504
Powerful Cars
458 Speciale 2015 597 398
458 Spider 2015 562 398
458 Italia 2014 562 398
California 2015 553 557

Setting the id values for each of the row groups makes things easier since you will have clean, markup-free ID values to reference in later calls (as was done with the tab_style() invocations in the example above). The use of the md() helper function makes it so that any Markdown provided for the label of a row group is faithfully rendered.

2.5.2 row_group_order()

By default, row groups appear in the order they were created with tab_row_group(). The row_group_order() function lets you rearrange them into any sequence you prefer.

Here is the function’s signature:

row_group_order(
  data,
  groups
)

The groups argument takes a vector of row group ID values in the desired order. If a group was created without an explicit id, its label serves as the ID. The special value NA represents the default/unnamed group (rows not explicitly assigned to any group).

gtcars |>
  dplyr::select(model, mfr, year, hp, msrp) |>
  dplyr::filter(
    mfr %in% c("Audi", "Porsche", "Maserati", "Ford")
  ) |>
  gt(rowname_col = "model") |>
  tab_row_group(
    label = "German",
    rows = mfr %in% c("Audi", "Porsche"),
    id = "german"
  ) |>
  tab_row_group(
    label = "Italian",
    rows = mfr == "Maserati",
    id = "italian"
  ) |>
  row_group_order(groups = c("italian", "german", NA)) |>
  cols_hide(columns = mfr)
year hp msrp
Italian
Granturismo 2016 454 132825
Quattroporte 2016 404 99900
Ghibli 2016 345 70600
German
R8 2015 430 115900
RS 7 2016 560 108900
S6 2016 450 70900
S7 2016 450 82900
S8 2016 520 114900
718 Boxster 2017 300 56000
718 Cayman 2017 300 53900
911 2016 350 84300
Panamera 2016 310 78100
GT 2017 647 447000

Here, Italian manufacturers appear first, followed by German, with any remaining rows in the unnamed group at the bottom. The cols_hide() call removes the mfr column since that information is now conveyed by the row group labels.

2.5.3 tab_stubhead()

Add a label to the stubhead of a gt table. The stubhead is the lone element that is positioned left of the column labels, and above the stub. If a stub does not exist, then there is no stubhead (so no change will be made when using this function in that case). We have the flexibility to use Markdown formatting for the stubhead label. Furthermore, if the table is intended for HTML output, we can use HTML for the stubhead label.

Here is the signature for tab_stubhead():

tab_stubhead(
  data,
  label
)

Using a small subset of the gtcars dataset, we can create a gt table with row labels. Since we have row labels in the stub (via use of rowname_col = "model" in the gt() function call) we have a stubhead, so, let’s add a stubhead label ("car") with the tab_stubhead() function to describe what’s in the stub.

gtcars |>
  dplyr::select(model, year, hp, trq) |>
  dplyr::slice(1:5) |>
  gt(rowname_col = "model") |>
  tab_stubhead(label = "car")
car year hp trq
GT 2017 647 550
458 Speciale 2015 597 398
458 Spider 2015 562 398
458 Italia 2014 562 398
488 GTB 2016 661 561

The stubhead label "car" now appears above the stub column, clarifying that the row labels represent car models. Without a stubhead, readers might need to infer this from context. For tables with many rows or complex stub content, a clear stubhead label improves navigability.

2.5.4 tab_stub_indent()

Indentation of row labels is an effective way for establishing structure in a table stub. The tab_stub_indent() function allows for fine control over row label indentation in the stub. We can use an explicit definition of an indentation level, or, employ an indentation directive using keywords.

Here is the function’s signature:

tab_stub_indent(
  data,
  rows,
  indent = "increase"
)

Let’s use a summarized version of the pizzaplace dataset to create a gt table with row groups and row labels. With the summary_rows() function, we’ll generate summary rows at the top of each row group. With tab_stub_indent() we can add indentation to the row labels in the stub.

pizzaplace |>
  dplyr::group_by(type, size) |>
  dplyr::summarize(
    sold = dplyr::n(),
    income = sum(price),
    .groups = "drop"
  ) |>
  gt(rowname_col = "size", groupname_col = "type") |>
  tab_header(title = "Pizzas Sold in 2015") |>
  fmt_integer(columns = sold) |>
  fmt_currency(columns = income) |>
  summary_rows(
    fns = list(label = "All Sizes", fn = "sum"),
    side = "top",
    fmt = list(
      ~ fmt_integer(., columns = sold),
      ~ fmt_currency(., columns = income)
    )
  ) |>
  tab_options(
    summary_row.background.color = "gray95",
    row_group.background.color = "#FFEFDB",
    row_group.as_column = TRUE
  ) |>
  tab_stub_indent(
    rows = everything(),
    indent = 2
  )
Pizzas Sold in 2015
sold income
chicken All Sizes 11,050 $195,919.50
L 4,932 $102,339.00
M 3,894 $65,224.50
S 2,224 $28,356.00
classic All Sizes 14,888 $220,053.10
L 4,057 $74,518.50
M 4,112 $60,581.75
S 6,139 $69,870.25
XL 552 $14,076.00
XXL 28 $1,006.60
supreme All Sizes 11,987 $208,197.00
L 4,564 $94,258.50
M 4,046 $66,475.00
S 3,377 $47,463.50
veggie All Sizes 11,649 $193,690.45
L 5,403 $104,202.70
M 3,583 $57,101.00
S 2,663 $32,386.75

The indent argument accepts either a numeric value (0 through 5) or the keywords "increase" or "decrease". When using numeric values, 0 means no indentation and 5 is the maximum. The keywords adjust indentation relative to the current level, which is useful when building tables programmatically.

exibble |>
  dplyr::select(row, group, num, currency) |>
  gt(rowname_col = "row", groupname_col = "group") |>
  tab_stub_indent(rows = c("row_1", "row_5"), indent = 1) |>
  tab_stub_indent(rows = c("row_2", "row_6"), indent = 2) |>
  tab_stub_indent(rows = c("row_3", "row_7"), indent = 3) |>
  tab_stub_indent(rows = c("row_4", "row_8"), indent = 4)
num currency
grp_a
row_1 1.111e-01 49.950
row_2 2.222e+00 17.950
row_3 3.333e+01 1.390
row_4 4.444e+02 65100.000
grp_b
row_5 5.550e+03 1325.810
row_6 NA 13.255
row_7 7.770e+05 NA
row_8 8.880e+06 0.440

Progressive indentation creates a visual hierarchy within each group, useful for showing parent-child relationships or levels of detail.

2.6 Column labels

Column labels appear at the top of each column and identify the data within. While gt uses column names from your data as default labels, you’ll often want to provide cleaner, more descriptive labels for presentation. Several functions help manage column labels: cols_label() for setting labels directly, cols_label_with() for applying transformations, and cols_move() family functions for reordering.

2.6.1 cols_label()

The cols_label() function assigns display labels to columns. These labels appear in the table while the underlying column names remain unchanged (useful for referencing columns in subsequent gt function calls).

Here’s the signature of cols_label():

cols_label(
  data,
  ...,
  .list = list2(...),
  .fn = NULL,
  .process_md = FALSE,
  .process_units = FALSE
)
gtcars |>
  dplyr::select(mfr, model, year, hp, mpg_c, mpg_h) |>
  dplyr::slice_head(n = 5) |>
  gt() |>
  cols_label(
    mfr = "Manufacturer",
    model = "Model",
    year = "Year",
    hp = "Horsepower",
    mpg_c = "City MPG",
    mpg_h = "Highway MPG"
  )
Manufacturer Model Year Horsepower City MPG Highway MPG
Ford GT 2017 647 11 18
Ferrari 458 Speciale 2015 597 13 17
Ferrari 458 Spider 2015 562 13 17
Ferrari 458 Italia 2014 562 13 17
Ferrari 488 GTB 2016 661 15 22

The labels can include Markdown or HTML formatting when wrapped with the appropriate helper functions:

towny |>
  dplyr::select(name, population_2021, density_2021, land_area_km2) |>
  dplyr::slice_head(n = 5) |>
  gt(rowname_col = "name") |>
  fmt_integer(columns = c(population_2021, density_2021)) |>
  fmt_number(columns = land_area_km2, decimals = 1) |>
  cols_label(
    population_2021 = md("**Population**"),
    density_2021 = md("Density *(per km²)*"),
    land_area_km2 = html("Area (km<sup>2</sup>)")
  )
Population Density (per km²) Area (km2)
Addington Highlands 2,534 2 1,294.0
Adelaide Metcalfe 3,011 9 331.1
Adjala-Tosorontio 10,989 30 371.5
Admaston/Bromley 2,995 6 519.6
Ajax 126,666 1,901 66.6

The md() helper renders Markdown syntax, making "Population" bold and adding italics to the unit in "Density". The html() helper allows raw HTML, which we use here to create a proper superscript for the squared unit in "Area". Mixing these approaches gives you flexibility: Markdown for simple formatting and HTML when you need precise control over the output.

2.6.1.1 Incorporating units with gt’s units notation

Measurement units frequently appear in column labels, and it’s often clearer to include them in the label itself rather than using other methods to convey unit information. While the cols_units() function provides one approach, gt also supports a built-in units notation system that allows you to define units directly within column labels.

To use this notation, surround the portion of text representing the units with {{ and }}. This tells gt to interpret that text as a units definition and render it appropriately.

The units notation uses a succinct, ASCII-friendly syntax for writing measurement units. While it may feel somewhat familiar, it’s specifically designed for this purpose. Each component (unit names, parentheses, symbols) is treated as a separate entity, and you can flexibly add subscripts and exponents. Here are the key rules and examples:

Basic units and division:

  • "m/s" and "m / s" both render as “m/s” with proper formatting
  • spaces around operators are optional and ignored
  • "m /s" gives the same result, since "/<unit>" is equivalent to "<unit>^-1"

Exponents:

  • "m s^-1" displays with the “-1” as a proper exponent
  • "t_i^2.5" shows a t with an “i” subscript and a “2.5” exponent
  • exponents are specified with the ^ character

Subscripts:

  • "E_h" renders as E with an “h” subscript
  • use the _ character for subscripts
  • "m[_0^2]" uses brackets with overstriking to set both subscript and superscript vertically aligned

Chemical formulas:

  • "g/L %C6H12O6%" encloses a chemical formula in % characters
  • numbers in formulas are automatically subscripted (e.g., C₆H₁₂O₆)
  • useful for biochemistry and chemistry tables

Automatic symbol conversions:

  • the letter “u” in "ug", "um", "uL", and "umol" converts to the Greek mu symbol (µ)
  • "degC" and "degF" render with a proper degree symbol (°C, °F)
  • these shortcuts make typing common units easier

Greek letters:

  • enclose Greek letter names in colons (e.g., :beta:, :sigma:)
  • lowercase: :alpha:, :beta:, :gamma:, :delta:, etc.
  • uppercase: :Alpha:, :Beta:, :Gamma:, :Delta:, etc.
  • works for the full Greek alphabet

Special symbols:

  • shorthand names enclosed in colons convert to proper symbols
  • examples: :angstrom:, :ohm:, :micro:, :degree:
  • provides access to scientific and mathematical symbols

Text formatting:

  • surround text with * for italics: "*m*/s" renders m/s
  • surround text with ** for bold: "**kg**" renders kg
  • can be applied to unit names, subscripts, or exponents partially or fully
  • useful for emphasizing specific components

We can use units notation to cleanly express measurement units in column labels. By enclosing units in double braces ({{ and }}), gt automatically formats them with proper typography:

towny |>
  dplyr::select(name, land_area_km2, density_2021) |>
  dplyr::slice_head(n = 5) |>
  gt(rowname_col = "name") |>
  fmt_number(columns = land_area_km2, decimals = 1) |>
  fmt_integer(columns = density_2021) |>
  cols_label(
    land_area_km2 = "Land Area {{km^2}}",
    density_2021 = "Density {{people/km^2}}"
  )
Land Area km2 Density people/km2
Addington Highlands 1,294.0 2
Adelaide Metcalfe 331.1 9
Adjala-Tosorontio 371.5 30
Admaston/Bromley 519.6 6
Ajax 66.6 1,901

This example demonstrates the units notation in action. The {km^2} syntax automatically renders with proper superscript formatting for the squared kilometer unit, while {people/km^2} renders as a clean fraction with the exponent properly formatted. The resulting table displays professional-looking column headers with mathematically correct unit notation.

Here’s a more complex example showing various features of units notation:

data.frame(
  measurement = c("Sample A", "Sample B", "Sample C"),
  velocity = c(15.2, 18.7, 12.4),
  energy = c(4.5, 5.2, 3.8),
  concentration = c(2.3, 3.1, 2.7),
  temperature = c(25, 30, 22)
) |>
  gt(rowname_col = "measurement") |>
  cols_label(
    velocity = "Velocity {{m/s}}",
    energy = "Energy {{E_h}}",
    concentration = "Concentration {{umol/L}}",
    temperature = "Temperature {{degC}}"
  ) |>
  fmt_number(columns = c(velocity, energy, concentration), decimals = 1) |>
  fmt_number(columns = temperature, decimals = 0)
Velocity m/s Energy Eh Concentration µmol/L Temperature °C
Sample A 15.2 4.5 2.3 25
Sample B 18.7 5.2 3.1 30
Sample C 12.4 3.8 2.7 22

This notation system makes it straightforward to include properly formatted units without needing to manually construct HTML or Unicode characters.

2.6.2 cols_label_with()

When you need to transform many column labels programmatically, cols_label_with() applies a function to generate labels:

cols_label_with(
  data,
  columns = everything(),
  fn
)

Rather than manually specifying labels for each column with cols_label(), this function applies a transformation function to column names to automatically generate readable labels. This is especially valuable when working with datasets that have systematic naming conventions.

Let’s see this in action with a subset of the towny dataset, which contains columns with underscored names like population_2021, density_2021, and land_area_km2. We’ll use cols_label_with() to automatically convert these technical column names into proper display labels:

towny |>
  dplyr::select(name, population_2021, density_2021, land_area_km2) |>
  dplyr::slice_head(n = 5) |>
  gt(rowname_col = "name") |>
  cols_label_with(
    fn = ~ gsub("_", " ", .x) |> tools::toTitleCase()
  )
Population 2021 Density 2021 Land Area Km2
Addington Highlands 2534 1.96 1293.99
Adelaide Metcalfe 3011 9.09 331.11
Adjala-Tosorontio 10989 29.58 371.53
Admaston/Bromley 2995 5.76 519.59
Ajax 126666 1900.75 66.64

In this example, the transformation function does two things: first, gsub("_", " ", .x) replaces all underscores with spaces, converting "population_2021" to "population 2021". Then tools::toTitleCase() applies title case formatting, resulting in clean labels like "Population 2021", "Density 2021", and "Land Area Km2". The function is applied to all columns by default (you can limit it with the columns argument if needed).

This approach is particularly useful when column names follow a consistent pattern that can be transformed into readable labels. It saves considerable time compared to manually labeling each column, especially in tables with many columns that follow naming conventions.

2.6.3 Hiding columns with cols_hide()

Sometimes you need columns for calculations or row grouping but don’t want them displayed. The cols_hide() function removes columns from the visual output while keeping them accessible for other gt operations:

cols_hide(
 data,
  columns
)
gtcars |>
  dplyr::select(mfr, model, year, hp) |>
  dplyr::slice_head(n = 6) |>
  gt(rowname_col = "model", groupname_col = "mfr") |>
  cols_hide(columns = mfr) |>
  tab_style(
    style = cell_fill(color = "lightblue"),
    locations = cells_body(columns = hp, rows = hp > 500)
  )
year hp
Ford
GT 2017 647
Ferrari
458 Speciale 2015 597
458 Spider 2015 562
458 Italia 2014 562
488 GTB 2016 661
California 2015 553

The mfr column is hidden but still serves as the grouping variable. Hidden columns can be referenced in tab_style(), fmt_*() functions, and other operations.

2.7 Inspecting table structure with tab_info()

As tables grow complex with multiple spanners, row groups, and customizations, it becomes helpful to inspect their structure. The tab_info() function generates a summary table showing column names, indices, and IDs for all table elements:

tab_info(data)
gtcars |>
  dplyr::select(model, mfr, year, hp, trq) |>
  dplyr::slice_head(n = 5) |>
  gt(rowname_col = "model") |>
  tab_spanner(
    label = "Power Metrics",
    columns = c(hp, trq),
    id = "power"
  ) |>
  tab_row_group(
    label = "High Performance",
    rows = hp > 500,
    id = "high_perf"
  ) |>
  tab_info()
Information on ID and Label Values
ID Idx
Lvl
Label
Columns
model 1 model
mfr 2 mfr
year 3 year
hp 4 hp
trq 5 trq
Rows
GT 1 GT
458 Speciale 2 458 Speciale
458 Spider 3 458 Spider
458 Italia 4 458 Italia
488 GTB 5 488 GTB
Spanners
power 1 Power Metrics

The output reveals:

  • column indices and their current names
  • spanner IDs (useful for styling or footnotes)
  • row group IDs and their indices
  • other structural metadata

This information is invaluable when you need to reference specific elements in tab_style(), tab_footnote(), or other location-based functions.

2.8 Putting it all together

Let’s combine the components covered in this chapter to create a well-structured table:

pizzaplace |>
  dplyr::filter(type %in% c("classic", "veggie")) |>
  dplyr::group_by(type, size) |>
  dplyr::summarize(
    orders = dplyr::n(),
    revenue = sum(price),
    avg_price = mean(price),
    .groups = "drop"
  ) |>
  gt(rowname_col = "size", groupname_col = "type") |>
  tab_header(
    title = md("**Pizza Sales Summary**"),
    subtitle = "Classic and Veggie varieties, 2015"
  ) |>
  tab_spanner(
    label = "Sales Metrics",
    columns = c(orders, revenue)
  ) |>
  tab_spanner(
    label = "Pricing",
    columns = avg_price
  ) |>
  cols_label(
    orders = "Orders",
    revenue = "Revenue",
    avg_price = "Avg. Price"
  ) |>
  fmt_integer(columns = orders) |>
  fmt_currency(columns = c(revenue, avg_price)) |>
  tab_stubhead(label = "Size") |>
  tab_source_note(source_note = md("Data from the `pizzaplace` dataset in **gt**")) |>
  tab_stub_indent(rows = everything(), indent = 1) |>
  opt_align_table_header(align = "left") |>
  tab_options(
    row_group.background.color = "#FFF8E7",
    column_labels.font.weight = "bold"
  )
Pizza Sales Summary
Classic and Veggie varieties, 2015
Size
Sales Metrics
Pricing
Orders Revenue Avg. Price
classic
L 4,057 $74,518.50 $18.37
M 4,112 $60,581.75 $14.73
S 6,139 $69,870.25 $11.38
XL 552 $14,076.00 $25.50
XXL 28 $1,006.60 $35.95
veggie
L 5,403 $104,202.70 $19.29
M 3,583 $57,101.00 $15.94
S 2,663 $32,386.75 $12.16
Data from the pizzaplace dataset in gt

This table demonstrates:

  • a header with title and subtitle
  • row groups (pizza types) with a stub (sizes)
  • a stubhead label
  • column spanners grouping related metrics
  • custom column labels
  • formatted values (integers and currency)
  • a source note in the footer
  • stub indentation for visual polish
  • table options for styling

2.9 Conclusion

In this chapter, we’ve covered the essential structural components that form the foundation of every gt table. These building blocks (from the basic gt() function to headers, footers, spanners, stubs, and row groups) provide the scaffolding upon which all table presentation rests.

Understanding these components is crucial because they establish the logical organization of your data before any formatting or styling is applied. The header gives context, the stub and row groups create vertical structure, column labels and spanners organize the horizontal dimension, and the footer provides additional information. Each component serves a specific purpose in making data more accessible and interpretable to your readers.

We’ve explored many functions such as gt(), tab_header(), tab_source_note(), tab_spanner(), tab_row_group(), cols_label(), and others. They’ll appear repeatedly throughout your gt workflow. They form the vocabulary you’ll use to describe and build table structure, whether you’re creating simple data displays or complex analytical presentations.

As you progress through subsequent chapters, you’ll see how these structural foundations support more advanced capabilities. Chapter 3 and Chapter 4 cover formatting functions for numeric and non-numeric data, building upon the column organization you establish here. Chapter 5 introduces substitution and text transformation, completing the three-stage rendering pipeline. Chapter 6 and Chapter 7 address column modifications and summary rows, extending the structural concepts introduced in this chapter. The styling techniques in Chapter 8 leverage the component structure to apply visual enhancements precisely where needed, while Chapter 9 shows how to add footnotes that reference structural elements. Chapter 10 demonstrates nanoplots for embedding visualizations within cells. The advanced topics in Chapter 11 and Chapter 12 cover table groups and output formats, and Chapter 13 and Chapter 14 explore how to extend gt through external packages and your own extensions. All of these capabilities depend on the solid structural foundation established by the basic components covered here.

Master these fundamentals now, and the more sophisticated table-building techniques ahead will feel like natural extensions of what you already know. The time invested in understanding table structure will pay dividends in every gt table you create going forward.