| Manufacturer | Model | Year | Horsepower | MPG (City) | Price (USD) |
|---|---|---|---|---|---|
| Ford | GT | 2017 | 647 | 11 | 447000 |
| Ferrari | 458 Speciale | 2015 | 597 | 13 | 291744 |
| Ferrari | 458 Spider | 2015 | 562 | 13 | 263553 |
| Ferrari | 458 Italia | 2014 | 562 | 13 | 233509 |
| Ferrari | 488 GTB | 2016 | 661 | 15 | 245400 |
6 Modifying columns
While formatting functions operate on the values within cells, sometimes we need to modify the structure and arrangement of the columns themselves. The gt package provides a comprehensive set of cols_*() functions that allow us to relabel columns, set their widths, control alignment, reorder their positions, hide and reveal them, and even merge data from multiple columns into a single column. These operations are essential for transforming raw data into polished, presentation-ready tables.
In this chapter, we’ll explore each of these column modification functions in detail. We’ll see how cols_label() and cols_label_with() allow us to create human-readable column headers, how cols_width() and cols_align() give us control over the visual presentation, how the cols_move*() family lets us rearrange columns, how cols_hide() and cols_unhide() manage column visibility, and finally how the cols_merge*() functions allow us to combine data from multiple columns into single, information-rich cells.
6.1 Labels, widths, alignments
The column labels in a gt table serve as the reader’s guide to understanding the data. By default, gt uses the column names from your data frame, which are often abbreviated, in snake_case, or otherwise not ideal for presentation purposes. Similarly, the default column widths and alignments may not always be optimal for your particular data. This section covers the functions that give you fine-grained control over these presentational aspects.
6.1.1 cols_label()
The cols_label() function is your primary tool for assigning human-readable labels to columns. It accepts a set of column-label pairs, where you provide the column name (or use tidyselect) and the label you’d like to assign.
Here is the function’s signature:
cols_label(
data,
...,
.list = list2(...),
.fn = NULL,
.process_units = TRUE
)Let’s create a table using a subset of the gtcars dataset and give the columns more descriptive labels:
The resulting table is immediately more readable. Where mfr might leave readers guessing, “Manufacturer” communicates clearly. The transformation from mpg_c to “MPG (City)” not only spells out the abbreviation but also clarifies that this is the city fuel economy figure (as opposed to highway). These small touches make a substantial difference in how quickly readers can interpret the data.
You can also use the md() and html() helper functions to add formatting to your labels. This is particularly useful for adding units, subscripts, or other stylistic touches:
| Date | Open | High | Low | Close | Volume (shares) |
|---|---|---|---|---|---|
| 2015-12-31 | 2060.59 | 2062.54 | 2043.62 | 2043.94 | 2655330000 |
| 2015-12-30 | 2077.34 | 2077.34 | 2061.97 | 2063.36 | 2367430000 |
| 2015-12-29 | 2060.54 | 2081.56 | 2060.54 | 2078.36 | 2542000000 |
| 2015-12-28 | 2057.77 | 2057.77 | 2044.20 | 2056.50 | 2492510000 |
| 2015-12-24 | 2063.52 | 2067.36 | 2058.73 | 2060.99 | 1411860000 |
In this table, we’ve used Markdown formatting to make the “Open” and “Close” labels bold (arguably the most important prices for traders), while “High” and “Low” appear in italics. The volume label demonstrates the power of using raw HTML: the <br> creates a line break and <small> renders the unit clarification in a smaller font size. This kind of rich formatting in column labels can guide the reader’s attention and provide additional context without cluttering the data cells themselves.
For columns with measurement units, gt supports a convenient notation using curly braces. When .process_units = TRUE (the default), text within {...} is automatically formatted as units:
| Value m/s2 | Amount USD |
|---|---|
| 0.1111 | 49.95 |
| 2.2220 | 17.95 |
| 33.3300 | 1.39 |
| 444.4000 | 65100.00 |
The units notation provides a clean, consistent way to display measurement units. Notice how m/s^2 is rendered with proper superscript formatting for the exponent (this happens automatically when using the double-curly-brace syntax). The result is scientifically accurate notation without requiring you to write HTML or Markdown.
6.1.2 cols_label_with()
When you need to apply a transformation function to multiple column labels at once, cols_label_with() is the more efficient approach. Rather than specifying each label individually, you provide a function that will be applied to the selected column names.
Here is the function’s signature:
cols_label_with(
data,
columns = everything(),
fn
)A common use case is converting snake_case column names to Title Case:
towny |>
dplyr::select(name, population_2021, density_2021, land_area_km2) |>
dplyr::slice_head(n = 5) |>
gt() |>
cols_label_with(
columns = everything(),
fn = ~ gsub("_", " ", .x) |> tools::toTitleCase()
)| Name | Population 2021 | Density 2021 | Land Area Km2 |
|---|---|---|---|
| Addington Highlands | 2534 | 1.96 | 1293.99 |
| Adelaide Metcalfe | 3011 | 9.09 | 331.11 |
| Adjala-Tosorontio | 10989 | 29.58 | 371.53 |
| Admaston/Bromley | 2995 | 5.76 | 519.59 |
| Ajax | 126666 | 1900.75 | 66.64 |
The anonymous function (specified using the ~ syntax) performs two operations: first, gsub("_", " ", .x) replaces all underscores with spaces, and then tools::toTitleCase() capitalizes the first letter of each word. The result transforms population_2021 into “Population 2021” and land_area_km2 into “Land Area Km2”. This approach is far more efficient than manually specifying labels for each column, especially when dealing with datasets that have many columns following consistent naming conventions.
You can also be selective about which columns to transform. Here, we use starts_with() to only modify certain columns:
| Manufacturer | Model | HP | HP RPM | TRQ | TRQ RPM |
|---|---|---|---|---|---|
| Ford | GT | 647 | 6250 | 550 | 5900 |
| Ferrari | 458 Speciale | 597 | 9000 | 398 | 6000 |
| Ferrari | 458 Spider | 562 | 9000 | 398 | 6000 |
| Ferrari | 458 Italia | 562 | 9000 | 398 | 6000 |
| Ferrari | 488 GTB | 661 | 8000 | 561 | 3000 |
Here we combine two approaches: explicit labeling with cols_label() for the mfr and model columns (which need specific human-friendly names), and programmatic labeling with cols_label_with() for the performance-related columns. The starts_with() selector targets only columns beginning with “hp” or “trq”, converting them to uppercase with spaces. This results in labels like “HP”, “HP RPM”, “TRQ”, and “TRQ RPM”, which are appropriate for technical specifications where abbreviations are standard.
The cols_label_with() function is particularly powerful when combined with custom functions. For example, you could create a function that looks up labels from a data dictionary or applies domain-specific formatting rules.
6.1.3 cols_width()
The cols_width() function allows you to explicitly set the widths of one or more columns. This is especially useful when you want consistent column sizing across tables, when dealing with long text that might cause awkward wrapping, or when certain columns deserve more visual prominence.
Here is the function’s signature:
cols_width(
data,
...,
.list = list2(...)
)The width values can be specified using the px() helper for pixel values or pct() for percentages. Let’s see a basic example:
| char | fctr | date | time |
|---|---|---|---|
| apricot | one | 2015-01-15 | 13:35 |
| banana | two | 2015-02-15 | 14:40 |
| coconut | three | 2015-03-15 | 15:45 |
| durian | four | 2015-04-15 | 16:50 |
By setting explicit pixel widths, we ensure that the columns maintain consistent sizing regardless of the content they contain. The char column receives the most space at 150 pixels since character data often varies in length, while fctr and time are narrower at 100 pixels. This explicit control prevents the table from looking unbalanced when some cells contain much more content than others.
Using percentage-based widths ensures that columns scale proportionally with the table’s overall width:
| Manufacturer | Model | Year | HP | Price |
|---|---|---|---|---|
| Ford | GT | 2017 | 647 | 447000 |
| Ferrari | 458 Speciale | 2015 | 597 | 291744 |
| Ferrari | 458 Spider | 2015 | 562 | 263553 |
| Ferrari | 458 Italia | 2014 | 562 | 233509 |
| Ferrari | 488 GTB | 2016 | 661 | 245400 |
With percentage-based widths totaling 100%, the columns will proportionally fill the available table width. The model column receives the largest share (35%) since car model names tend to be longer, while year and hp get just 10% each since they contain short numeric values. This approach is particularly valuable for responsive layouts where the table width may vary depending on the viewing context.
You can use tidyselect helpers to apply widths to multiple columns at once:
towny |>
dplyr::select(name, starts_with("population"), starts_with("density")) |>
dplyr::slice_head(n = 5) |>
gt() |>
cols_width(
name ~ px(200),
starts_with("population") ~ px(120),
starts_with("density") ~ px(100)
)| name | population_1996 | population_2001 | population_2006 | population_2011 | population_2016 | population_2021 | density_1996 | density_2001 | density_2006 | density_2011 | density_2016 | density_2021 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Addington Highlands | 2429 | 2402 | 2512 | 2517 | 2318 | 2534 | 1.88 | 1.86 | 1.94 | 1.95 | 1.79 | 1.96 |
| Adelaide Metcalfe | 3128 | 3149 | 3135 | 3028 | 2990 | 3011 | 9.45 | 9.51 | 9.47 | 9.14 | 9.03 | 9.09 |
| Adjala-Tosorontio | 9359 | 10082 | 10695 | 10603 | 10975 | 10989 | 25.19 | 27.14 | 28.79 | 28.54 | 29.54 | 29.58 |
| Admaston/Bromley | 2837 | 2824 | 2716 | 2844 | 2935 | 2995 | 5.46 | 5.44 | 5.23 | 5.47 | 5.65 | 5.76 |
| Ajax | 64430 | 73753 | 90167 | 109600 | 119677 | 126666 | 966.84 | 1106.74 | 1353.05 | 1644.66 | 1795.87 | 1900.75 |
The starts_with() helper elegantly handles all population columns and all density columns with single rules, assigning 120 pixels to each population column and 100 pixels to each density column. This is considerably more maintainable than specifying widths for each column individually, and it automatically accommodates any new columns that might be added to the dataset later (provided they follow the same naming convention).
6.1.4 cols_align()
By default, gt automatically aligns columns based on their data types: numeric columns are right-aligned and text columns are left-aligned. The cols_align() function lets you override this behavior when needed.
Here is the function’s signature:
cols_align(
data,
align = c("auto", "left", "center", "right"),
columns = everything()
)Let’s see how to center-align specific columns:
| char | num | currency | date |
|---|---|---|---|
| apricot | 0.1111 | 49.95 | 2015-01-15 |
| banana | 2.2220 | 17.95 | 2015-02-15 |
| coconut | 33.3300 | 1.39 | 2015-03-15 |
| durian | 444.4000 | 65100.00 | 2015-04-15 |
In this table, the char and date columns are now center-aligned, while num and currency retain their default right-alignment. Notice how the center alignment affects only the body cells; the column labels remain in their default positions. Centering text columns can create a more balanced visual appearance, though it may reduce readability for longer text values (so use this technique judiciously).
You might want to center all columns for a more symmetrical appearance, particularly in summary tables:
gtcars |>
dplyr::count(mfr, name = "n_models") |>
dplyr::arrange(desc(n_models)) |>
dplyr::slice_head(n = 8) |>
gt() |>
cols_label(
mfr = "Manufacturer",
n_models = "Number of Models"
) |>
cols_align(
align = "center",
columns = everything()
)| Manufacturer | Number of Models |
|---|---|
| Ferrari | 9 |
| Audi | 5 |
| BMW | 5 |
| Aston Martin | 4 |
| Porsche | 4 |
| Lamborghini | 3 |
| Maserati | 3 |
| Mercedes-Benz | 2 |
For this summary table showing model counts by manufacturer, center alignment creates a clean, symmetrical presentation. Since the table has only two columns and relatively short values in each, center alignment works well. The everything() selector applies the alignment to all columns in one call, keeping the code concise.
For numeric values that benefit from decimal alignment, gt offers cols_align_decimal(). This function aligns values along their decimal points, making it easier to compare values across rows:
| item | price | quantity |
|---|---|---|
| Widget A | 1.50 | 100 |
| Widget B | 24.99 | 50 |
| Widget C | 199.00 | 5 |
| Total | 225.49 | 155 |
The price column now has its values aligned on the decimal point, making it easy to compare prices at a glance. Without decimal alignment, the varying number of digits before the decimal would make visual comparison more difficult. This is especially valuable in financial tables, scientific data, or any context where precise numeric comparison matters.
6.2 Moving columns
The order in which columns appear can significantly impact how readers interpret and navigate a table. While you could reorder columns in your source data before passing it to gt(), the cols_move*() functions provide a convenient way to rearrange columns as part of your table-building pipeline. This is especially useful when the order that makes sense for analysis differs from the order that works best for presentation.
6.2.1 cols_move_to_start()
The cols_move_to_start() function moves one or more columns to the beginning of the column series (i.e., to the left side of the table).
Here is the function’s signature:
cols_move_to_start(
data,
columns
)Let’s say we have a dataset where we want to emphasize certain columns by placing them first:
| msrp | mfr | model | year | bdy_style | hp |
|---|---|---|---|---|---|
| 447000 | Ford | GT | 2017 | coupe | 647 |
| 291744 | Ferrari | 458 Speciale | 2015 | coupe | 597 |
| 263553 | Ferrari | 458 Spider | 2015 | convertible | 562 |
| 233509 | Ferrari | 458 Italia | 2014 | coupe | 562 |
| 245400 | Ferrari | 488 GTB | 2016 | coupe | 661 |
The msrp column (which originally appeared last in our selection) now leads the table. If price is the primary point of interest for our readers, placing it first ensures they see that information immediately. The remaining columns maintain their relative order, simply shifted one position to the right.
You can move multiple columns at once. They will appear in the order you specify:
| date | time | datetime | num | char | fctr | currency |
|---|---|---|---|---|---|---|
| 2015-01-15 | 13:35 | 2018-01-01 02:22 | 0.1111 | apricot | one | 49.95 |
| 2015-02-15 | 14:40 | 2018-02-02 14:33 | 2.2220 | banana | two | 17.95 |
| 2015-03-15 | 15:45 | 2018-03-03 03:44 | 33.3300 | coconut | three | 1.39 |
| 2015-04-15 | 16:50 | 2018-04-04 15:55 | 444.4000 | durian | four | 65100.00 |
All three temporal columns now appear at the beginning of the table, in the order we specified: date, then time, then datetime. This reorganization groups related information together, which is often more logical for the reader than the arbitrary order that might result from the original data structure.
6.2.2 cols_move_to_end()
Conversely, cols_move_to_end() moves columns to the end of the column series (i.e., to the right side of the table). This is often useful for placing less important columns or summary columns at the end.
Here is the function’s signature:
cols_move_to_end(
data,
columns
)In the following example, we move the identifying columns to the end so that the numeric data appears first:
| open | high | low | close | volume | date |
|---|---|---|---|---|---|
| 2060.59 | 2062.54 | 2043.62 | 2043.94 | 2655330000 | 2015-12-31 |
| 2077.34 | 2077.34 | 2061.97 | 2063.36 | 2367430000 | 2015-12-30 |
| 2060.54 | 2081.56 | 2060.54 | 2078.36 | 2542000000 | 2015-12-29 |
| 2057.77 | 2057.77 | 2044.20 | 2056.50 | 2492510000 | 2015-12-28 |
| 2063.52 | 2067.36 | 2058.73 | 2060.99 | 1411860000 | 2015-12-24 |
By moving date to the end, the numeric trading data (open, high, low, close, volume) takes center stage. This arrangement might be preferred when the focus is on price movements rather than the specific dates (perhaps in a context where readers already know the time period being examined).
Multiple columns can be moved together:
| year | hp | trq | msrp | mfr | model |
|---|---|---|---|---|---|
| 2017 | 647 | 550 | 447000 | Ford | GT |
| 2015 | 597 | 398 | 291744 | Ferrari | 458 Speciale |
| 2015 | 562 | 398 | 263553 | Ferrari | 458 Spider |
| 2014 | 562 | 398 | 233509 | Ferrari | 458 Italia |
| 2016 | 661 | 561 | 245400 | Ferrari | 488 GTB |
Now the numeric specifications (year, hp, trq, msrp) appear first, followed by the identifying information (mfr, model). This layout prioritizes the data over the labels, a presentation choice that might suit analytical contexts where readers are comparing specifications across vehicles.
6.2.3 cols_move()
For more precise control over column placement, cols_move() allows you to position columns after a specified reference column.
Here is the function’s signature:
cols_move(
data,
columns,
after
)This function is particularly useful when you want to group related columns together:
| mfr | model | msrp | year | hp | hp_rpm | trq | trq_rpm |
|---|---|---|---|---|---|---|---|
| Ford | GT | 447000 | 2017 | 647 | 6250 | 550 | 5900 |
| Ferrari | 458 Speciale | 291744 | 2015 | 597 | 9000 | 398 | 6000 |
| Ferrari | 458 Spider | 263553 | 2015 | 562 | 9000 | 398 | 6000 |
| Ferrari | 458 Italia | 233509 | 2014 | 562 | 9000 | 398 | 6000 |
| Ferrari | 488 GTB | 245400 | 2016 | 661 | 8000 | 561 | 3000 |
The msrp column is now positioned immediately after model, giving readers the price information right alongside the vehicle identification. The after argument specifies the reference column; the moved column(s) will appear directly to its right. This flexibility is valuable when you need to interleave columns rather than simply push them to the start or end.
Here’s another example where we reorder columns to group performance metrics together:
| mfr | model | year | hp | trq | mpg_c | mpg_h |
|---|---|---|---|---|---|---|
| Ford | GT | 2017 | 647 | 550 | 11 | 18 |
| Ferrari | 458 Speciale | 2015 | 597 | 398 | 13 | 17 |
| Ferrari | 458 Spider | 2015 | 562 | 398 | 13 | 17 |
| Ferrari | 458 Italia | 2014 | 562 | 398 | 13 | 17 |
| Ferrari | 488 GTB | 2016 | 661 | 561 | 15 | 22 |
Both hp and trq have been moved to follow year, grouping the power-related metrics together. The fuel economy columns (mpg_c, mpg_h) now appear at the end. By thoughtfully arranging columns, you help readers understand the logical structure of your data. In this case, suggesting a grouping of identity information, power metrics, and efficiency metrics.
Using tidyselect helpers with cols_move() can be quite powerful for organizing tables with many columns:
towny |>
dplyr::select(name, latitude, longitude, starts_with("population"), starts_with("density")) |>
dplyr::slice_head(n = 5) |>
gt() |>
cols_move(
columns = starts_with("density"),
after = name
)| name | density_1996 | density_2001 | density_2006 | density_2011 | density_2016 | density_2021 | latitude | longitude | population_1996 | population_2001 | population_2006 | population_2011 | population_2016 | population_2021 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Addington Highlands | 1.88 | 1.86 | 1.94 | 1.95 | 1.79 | 1.96 | 45.00000 | -77.25000 | 2429 | 2402 | 2512 | 2517 | 2318 | 2534 |
| Adelaide Metcalfe | 9.45 | 9.51 | 9.47 | 9.14 | 9.03 | 9.09 | 42.95000 | -81.70000 | 3128 | 3149 | 3135 | 3028 | 2990 | 3011 |
| Adjala-Tosorontio | 25.19 | 27.14 | 28.79 | 28.54 | 29.54 | 29.58 | 44.13333 | -79.93333 | 9359 | 10082 | 10695 | 10603 | 10975 | 10989 |
| Admaston/Bromley | 5.46 | 5.44 | 5.23 | 5.47 | 5.65 | 5.76 | 45.52917 | -76.89694 | 2837 | 2824 | 2716 | 2844 | 2935 | 2995 |
| Ajax | 966.84 | 1106.74 | 1353.05 | 1644.66 | 1795.87 | 1900.75 | 43.85833 | -79.03639 | 64430 | 73753 | 90167 | 109600 | 119677 | 126666 |
The starts_with("density") selector captures all density-related columns and positions them immediately after name. The original column order placed population columns before density columns, but now density comes first. When working with wide datasets that have many similarly-named columns, tidyselect helpers make reorganization straightforward and your code more resilient to future changes in the underlying data.
6.3 Hiding columns
There are situations where you want to retain columns in your data for reference or for use in calculations, but you don’t want them to appear in the rendered table. The cols_hide() function makes columns invisible while keeping them available for other gt operations. This is particularly useful when you need a column’s data for conditional formatting, merging operations, or calculations, but displaying that column would clutter the table.
6.3.1 cols_hide()
The cols_hide() function takes one or more columns and hides them from the table display. The columns remain part of the underlying gt object and can still be referenced in other function calls.
Here is the function’s signature:
cols_hide(
data,
columns
)Let’s create a table where we use a column for sorting but don’t want to display it:
| mfr | model | hp | msrp |
|---|---|---|---|
| Ferrari | LaFerrari | 949 | 1416362 |
| Ferrari | F12Berlinetta | 731 | 319995 |
| Lamborghini | Aventador | 700 | 397500 |
| Ferrari | GTC4Lusso | 680 | 298000 |
| Ferrari | 488 GTB | 661 | 245400 |
| Ferrari | FF | 652 | 295000 |
| Chevrolet | Corvette | 650 | 88345 |
| Ford | GT | 647 | 447000 |
The table displays the top 8 cars by horsepower, but the year column is no longer visible. We retain the data in our pipeline (and could use it for other operations like conditional formatting), but it doesn’t appear in the rendered output. This is cleaner than removing the column entirely from the source data, especially when the column might be needed for other purposes.
A common use case is hiding columns that have been merged into other columns. When you use cols_merge() (discussed later), the source columns remain in the table unless explicitly hidden:
| char | num |
|---|---|
| apricot | 0.1111 |
| banana | 2.2220 |
| coconut | 33.3300 |
| durian | 444.4000 |
Only char and num remain visible; fctr and currency are hidden. The data is still present in the gt object, but it simply isn’t rendered. This technique is particularly useful when you want to keep auxiliary data available for reference while presenting a simplified view to the reader.
You can use tidyselect helpers to hide multiple related columns:
| mfr | model | year | hp | trq | mpg_c | mpg_h |
|---|---|---|---|---|---|---|
| Ford | GT | 2017 | 647 | 550 | 11 | 18 |
| Ferrari | 458 Speciale | 2015 | 597 | 398 | 13 | 17 |
| Ferrari | 458 Spider | 2015 | 562 | 398 | 13 | 17 |
| Ferrari | 458 Italia | 2014 | 562 | 398 | 13 | 17 |
| Ferrari | 488 GTB | 2016 | 661 | 561 | 15 | 22 |
The ends_with("_rpm") selector hides both hp_rpm and trq_rpm in a single statement. The table shows horsepower and torque values without the RPM figures (perhaps the RPM data is too technical for a general audience, or it’s being reserved for a more detailed companion table). Either way, the data remains accessible within the gt object should you need it.
6.3.2 cols_unhide()
The cols_unhide() function reverses the hiding operation, making previously hidden columns visible again. This is useful in programmatic contexts where you might conditionally show or hide columns, or when you’ve inherited a gt table object and want to reveal certain columns.
Here is the function’s signature:
cols_unhide(
data,
columns
)Let’s demonstrate a workflow where we hide and then selectively unhide columns:
| mfr | model | hp | msrp |
|---|---|---|---|
| Ford | GT | 647 | 447000 |
| Ferrari | 458 Speciale | 597 | 291744 |
| Ferrari | 458 Spider | 562 | 263553 |
| Ferrari | 458 Italia | 562 | 233509 |
| Ferrari | 488 GTB | 661 | 245400 |
We first hide three columns (year, hp, trq), then selectively reveal hp. The result is a table showing mfr, model, hp, and msrp (with year and trq remaining hidden). This pattern demonstrates the composability of gt operations: you can hide broadly and then unhide specifically, which can be useful when building tables programmatically or when adapting templates for different contexts.
The cols_unhide() function is particularly valuable when working with functions that return gt tables. You can receive a table, inspect which columns are hidden, and then unhide specific ones as needed. This follows the general gt philosophy that table construction operations should be reversible and composable.
6.4 Merging column data together
One of the more powerful column modification capabilities in gt is the ability to merge data from multiple columns into a single column. This is achieved through the cols_merge*() family of functions. Merging is useful when you want to present related information together in a compact format (for example, combining a value with its uncertainty, showing a range from two values, or displaying a count alongside its percentage).
It’s important to understand that when columns are merged, the data from the source columns is combined into a target column. The source columns are not automatically hidden; you typically want to use cols_hide() afterward if you don’t want the source columns to remain visible. However, some of the specialized merge functions handle this automatically.
6.4.1 cols_merge()
The cols_merge() function provides a flexible way to combine values from multiple columns using a pattern string. The pattern uses column indices (e.g., {1}, {2}) to reference the columns being merged.
Here is the function’s signature:
cols_merge(
data,
columns,
rows = everything(),
hide_columns = columns[-1],
pattern = NULL,
sep = " "
)Let’s combine manufacturer and model into a single column:
| Vehicle | year | hp | msrp |
|---|---|---|---|
| Ford GT | 2017 | 647 | 447000 |
| Ferrari 458 Speciale | 2015 | 597 | 291744 |
| Ferrari 458 Spider | 2015 | 562 | 263553 |
| Ferrari 458 Italia | 2014 | 562 | 233509 |
| Ferrari 488 GTB | 2016 | 661 | 245400 |
The manufacturer and model names are now combined into a single “Vehicle” column, with values like “Ford GT” and “Ferrari 458 Speciale”. The {1} and {2} placeholders in the pattern refer to the first and second columns specified, respectively. Notice that the model column has been automatically hidden (the default behavior), leaving us with a cleaner four-column table.
The pattern string is quite flexible. You can include any literal text between the placeholders:
| mfr | Model (Year) | hp | msrp |
|---|---|---|---|
| Ford | GT (2017) | 647 | 447000 |
| Ferrari | 458 Speciale (2015) | 597 | 291744 |
| Ferrari | 458 Spider (2015) | 562 | 263553 |
| Ferrari | 458 Italia (2014) | 562 | 233509 |
| Ferrari | 488 GTB (2016) | 661 | 245400 |
Now each cell in the model column contains the model name followed by the year in parentheses, such as "GT (2017)" or "458 Speciale (2015)". The parentheses are literal characters in the pattern, demonstrating how you can embed any formatting you need. This is a common convention for displaying names with supplementary information.
You can merge more than two columns:
| Vehicle Information | hp | msrp |
|---|---|---|
| Ford GT, 2017 | 647 | 447000 |
| Ferrari 458 Speciale, 2015 | 597 | 291744 |
| Ferrari 458 Spider, 2015 | 562 | 263553 |
| Ferrari 458 Italia, 2014 | 562 | 233509 |
| Ferrari 488 GTB, 2016 | 661 | 245400 |
All three identifying pieces of information (manufacturer, model, and year) are now combined into a single column, producing entries like “Ford GT, 2017”. This aggressive consolidation reduces the table from five columns to three, creating a more compact presentation when the identifying information is secondary to the numeric data.
Note that by default, cols_merge() hides all columns except the first one specified in columns. If you want to retain visibility of certain columns, you can adjust the hide_columns argument.
6.4.2 cols_merge_uncert()
When presenting scientific or statistical data, it’s common to show a value along with its uncertainty (e.g., error bounds). The cols_merge_uncert() function is specifically designed for this purpose, creating nicely formatted value ± uncertainty presentations.
Here is the function’s signature:
cols_merge_uncert(
data,
col_val,
col_uncert,
rows = everything(),
sep = " +/- ",
autohide = TRUE
)Let’s create a dataset with values and their uncertainties:
| sample | Measurement |
|---|---|
| A | 12.4 ± 0.3 |
| B | 8.7 ± 0.2 |
| C | 15.2 ± 0.5 |
| D | 10.1 ± 0.4 |
Each measurement now displays with its associated uncertainty using the conventional ± notation: “12.4 ± 0.3”, “8.7 ± 0.2”, and so on. This format is standard in scientific literature and immediately communicates both the measured value and its precision. The uncertainty column has been automatically hidden thanks to the autohide = TRUE default.
The function uses ± as the default separator, which is standard notation for uncertainties. By default, autohide = TRUE means the uncertainty column is automatically hidden after merging.
You can also use asymmetric uncertainties by providing two uncertainty columns in col_uncert:
asymmetric_data <- dplyr::tibble(
experiment = c("Trial 1", "Trial 2", "Trial 3"),
result = c(50.2, 48.9, 51.5),
lower_bound = c(2.1, 1.8, 2.5),
upper_bound = c(3.2, 2.4, 3.1)
)
asymmetric_data |>
gt() |>
cols_merge_uncert(
col_val = result,
col_uncert = c(lower_bound, upper_bound)
) |>
cols_label(result = "Result")| experiment | Result |
|---|---|
| Trial 1 | 50.2+3.2 −2.1 |
| Trial 2 | 48.9+2.4 −1.8 |
| Trial 3 | 51.5+3.1 −2.5 |
When uncertainties differ above and below the measured value, providing two uncertainty columns produces asymmetric notation like “50.2 (+3.2/−2.1)”. This format accurately represents situations where error bounds are not symmetric (common in fields like particle physics, astrophysics, and certain statistical analyses where confidence intervals are naturally asymmetric).
6.4.3 cols_merge_range()
The cols_merge_range() function combines two columns that represent the bounds of a range (e.g., minimum and maximum values) into a single column with appropriate range notation.
Here is the function’s signature:
cols_merge_range(
data,
col_begin,
col_end,
rows = everything(),
sep = NULL,
autohide = TRUE,
locale = NULL
)This function is exceedingly useful for displaying ranges like price ranges, temperature ranges, or date ranges:
price_ranges <- dplyr::tibble(
product = c("Basic", "Standard", "Premium", "Enterprise"),
min_price = c(9.99, 29.99, 79.99, 199.99),
max_price = c(19.99, 49.99, 149.99, 499.99)
)
price_ranges |>
gt() |>
cols_merge_range(
col_begin = min_price,
col_end = max_price
) |>
cols_label(min_price = "Price Range") |>
fmt_currency(columns = min_price, decimals = 2)| product | Price Range |
|---|---|
| Basic | $9.99–19.99 |
| Standard | $29.99–49.99 |
| Premium | $79.99–149.99 |
| Enterprise | $199.99–499.99 |
Each product tier now shows its price range in a single column with values like “$9.99–$19.99”. The en-dash (–) separator is the typographically correct character for ranges, and cols_merge_range() uses it by default. By applying fmt_currency() to the target column, both the minimum and maximum values in each cell receive currency formatting.
Note that when you apply formatting functions like fmt_currency(), you should apply them to the target column (the first column in the merge operation) for the formatting to take effect on the merged result.
Here’s another example using the gtcars dataset to show horsepower ranges by manufacturer:
gtcars |>
dplyr::group_by(mfr) |>
dplyr::summarize(
min_hp = min(hp),
max_hp = max(hp),
n_models = n(),
.groups = "drop"
) |>
dplyr::slice_head(n = 8) |>
gt() |>
cols_merge_range(
col_begin = min_hp,
col_end = max_hp
) |>
cols_label(
mfr = "Manufacturer",
min_hp = "Horsepower Range",
n_models = "Models"
)| Manufacturer | Horsepower Range | Models |
|---|---|---|
| Acura | 573–573 | 1 |
| Aston Martin | 430–608 | 4 |
| Audi | 430–560 | 5 |
| BMW | 315–560 | 5 |
| Bentley | 500–500 | 1 |
| Chevrolet | 650–650 | 1 |
| Dodge | 645–645 | 1 |
| Ferrari | 553–949 | 9 |
This summary table shows the range of horsepower values for each manufacturer’s lineup, presented as entries like “245–647” for Audi or “562–691” for Ferrari. At a glance, readers can see which manufacturers offer broader or narrower performance ranges. The compact range notation conveys the same information as two separate columns but in a more digestible format.
6.4.4 cols_merge_n_pct()
A very common table pattern is showing counts alongside their percentages. The cols_merge_n_pct() function automates this, creating entries like “150 (25.0%)” from separate count and percentage columns.
Here is the function’s signature:
cols_merge_n_pct(
data,
col_n,
col_pct,
rows = everything(),
autohide = TRUE
)Let’s create a frequency table showing counts and percentages:
| Body Style | Count (Percentage) |
|---|---|
| convertible | 5 (10.638298) |
| coupe | 32 (68.085106) |
| hatchback | 2 (4.255319) |
| sedan | 8 (17.021277) |
The table now shows entries like “coupe: 18 (34.62%)” and “sedan: 12 (23.08%)”, communicating both the raw count and its proportion of the total. This dual presentation helps readers understand both the absolute frequency and relative importance of each category. The function automatically formats the percentage and wraps it in parentheses, following the conventional notation used in statistical and research publications.
This function is particularly useful for survey results, categorical summaries, and any situation where you want to show both absolute and relative frequencies:
towny |>
dplyr::mutate(
size_category = dplyr::case_when(
population_2021 >= 100000 ~ "Large",
population_2021 >= 20000 ~ "Medium",
TRUE ~ "Small"
)
) |>
dplyr::count(size_category, name = "count") |>
dplyr::mutate(percentage = count / sum(count) * 100) |>
dplyr::arrange(desc(count)) |>
gt() |>
cols_merge_n_pct(
col_n = count,
col_pct = percentage
) |>
cols_label(
size_category = "Municipality Size",
count = "Count (Pct)"
)| Municipality Size | Count (Pct) |
|---|---|
| Small | 325 (78.502415) |
| Medium | 61 (14.734300) |
| Large | 28 (6.763285) |
This categorical summary of Ontario municipalities shows how many fall into each size category along with the percentage. We can immediately see that “Small” municipalities dominate the dataset while “Large” municipalities are relatively rare. The merged format prevents the table from becoming unwieldy with separate count and percentage columns, while still conveying the complete picture.
The cols_merge_n_pct() function handles the formatting of the percentage values and applies appropriate parentheses automatically, creating a clean and professional appearance for your frequency tables.
6.5 Adding columns and rows
While the previous sections focused on modifying existing columns, gt also provides functions to add entirely new columns and rows directly within the table-building pipeline. This allows you to perform calculations and add data without returning to dplyr or modifying your source data frame.
6.5.1 cols_add()
The cols_add() function works similarly to dplyr::mutate(), allowing you to create new columns based on expressions involving existing columns. This is particularly useful for calculated fields, unit conversions, or derived values.
Here is the function’s signature:
cols_add(
.data,
...,
.before = NULL,
.after = NULL
)Let’s create a table that calculates population density from existing columns:
towny |>
dplyr::select(name, population_2021, land_area_km2) |>
dplyr::filter(population_2021 > 100000) |>
dplyr::slice_max(population_2021, n = 8) |>
gt() |>
cols_add(
density = population_2021 / land_area_km2
) |>
fmt_integer(columns = population_2021) |>
fmt_number(columns = c(land_area_km2, density), decimals = 1) |>
cols_label(
name = "Municipality",
population_2021 = "Population",
land_area_km2 = "Area (km²)",
density = "Density (per km²)"
)| Municipality | Population | Area (km²) | Density (per km²) |
|---|---|---|---|
| Toronto | 2,794,356 | 631.1 | 4,427.8 |
| Ottawa | 1,017,449 | 2,788.2 | 364.9 |
| Mississauga | 717,961 | 292.7 | 2,452.6 |
| Brampton | 656,480 | 265.9 | 2,469.0 |
| Hamilton | 569,353 | 1,118.3 | 509.1 |
| London | 422,324 | 420.5 | 1,004.3 |
| Markham | 338,503 | 210.9 | 1,604.8 |
| Vaughan | 323,103 | 272.4 | 1,186.0 |
The density column is calculated directly in the gt pipeline, dividing population by land area. This keeps the calculation close to where it’s displayed, making the table code more self-contained and easier to understand.
You can add multiple columns in a single call, and use the .before or .after arguments to position them:
gtcars |>
dplyr::select(mfr, model, hp, mpg_c, mpg_h) |>
dplyr::slice(1:6) |>
gt() |>
cols_add(
mpg_avg = (mpg_c + mpg_h) / 2,
hp_per_mpg = hp / mpg_avg,
.after = mpg_h
) |>
fmt_number(columns = c(mpg_avg, hp_per_mpg), decimals = 1) |>
cols_label(
mfr = "Make",
model = "Model",
hp = "HP",
mpg_c = "City MPG",
mpg_h = "Hwy MPG",
mpg_avg = "Avg MPG",
hp_per_mpg = "HP/MPG"
)| Make | Model | HP | City MPG | Hwy MPG | Avg MPG | HP/MPG |
|---|---|---|---|---|---|---|
| Ford | GT | 647 | 11 | 18 | 14.5 | 44.6 |
| Ferrari | 458 Speciale | 597 | 13 | 17 | 15.0 | 39.8 |
| Ferrari | 458 Spider | 562 | 13 | 17 | 15.0 | 37.5 |
| Ferrari | 458 Italia | 562 | 13 | 17 | 15.0 | 37.5 |
| Ferrari | 488 GTB | 661 | 15 | 22 | 18.5 | 35.7 |
| Ferrari | California | 553 | 16 | 23 | 19.5 | 28.4 |
Here we calculate an average MPG from city and highway figures, then compute a horsepower-to-efficiency ratio. Both new columns are placed after the highway MPG column using .after = mpg_h.
The cols_add() function is also useful for adding constant values or flags:
exibble |>
dplyr::select(char, num, currency) |>
dplyr::slice(1:4) |>
gt() |>
cols_add(
status = c("Active", "Pending", "Active", "Closed"),
reviewed = TRUE,
.before = char
) |>
cols_label(
status = "Status",
reviewed = "Reviewed",
char = "Item",
num = "Value",
currency = "Amount"
) |>
fmt_tf(columns = reviewed, tf_style = "check-mark")| Status | Reviewed | Item | Value | Amount |
|---|---|---|---|---|
| Active | ✔ | apricot | 0.1111 | 49.95 |
| Pending | ✔ | banana | 2.2220 | 17.95 |
| Active | ✔ | coconut | 33.3300 | 1.39 |
| Closed | ✔ | durian | 444.4000 | 65100.00 |
You can even start with an empty table and build it up entirely through cols_add():
| product | price | quantity | total |
|---|---|---|---|
| Widget | $29.99 | 100 | $2,999.00 |
| Gadget | $49.99 | 50 | $2,499.50 |
| Sprocket | $19.99 | 200 | $3,998.00 |
This approach can be useful when building tables programmatically or when the table structure itself is part of the computation.
6.5.2 rows_add()
Just as cols_add() adds columns, rows_add() adds new rows to your table. This is useful for inserting summary rows, spacer rows, or additional data points directly in the gt pipeline.
Here is the function’s signature:
rows_add(
.data,
...,
.list = list2(...),
.before = NULL,
.after = NULL,
.n_empty = NULL
)Let’s add a row to a simple table:
| item | quantity | price |
|---|---|---|
| Apples | 50 | $1.20 |
| Oranges | 30 | $0.95 |
| Bananas | 45 | $0.65 |
| Grapes | 25 | $2.50 |
The new row is added at the bottom by default. You can use .before or .after to position it elsewhere:
| rank | name | score |
|---|---|---|
| 1 | Alice | 95 |
| 2 | New Entry | 90 |
| 2 | Bob | 87 |
| 3 | Carol | 82 |
The new row is inserted after row 1, pushing the original rows 2 and 3 down. Note that you’ll need to handle any renumbering or reordering logic yourself; rows_add() simply inserts the data.
You can add multiple rows at once by providing vectors:
| Category | Q4 Sales |
|---|---|
| Electronics | $15,000 |
| Clothing | $8,500 |
| Food | $12,000 |
| Books | $3,200 |
The .n_empty argument lets you add blank rows, which can be useful for visual spacing:
| Section | Page |
|---|---|
| Introduction | 1 |
| Methods | 15 |
| Results | 32 |
Like cols_add(), you can build a table from scratch using rows_add():
dplyr::tibble(
time = lubridate::POSIXct(),
event = character(0)
) |>
gt() |>
rows_add(
time = lubridate::ymd_hms("2024-01-15 09:30:00"),
event = "Meeting started"
) |>
rows_add(
time = lubridate::ymd_hms("2024-01-15 10:45:00"),
event = "Break"
) |>
rows_add(
time = lubridate::ymd_hms("2024-01-15 11:00:00"),
event = "Meeting resumed"
) |>
fmt_datetime(columns = time, date_style = "yMd", time_style = "Hm")| time | event |
|---|---|
| 1/15/2024 09:30 | Meeting started |
| 1/15/2024 10:45 | Break |
| 1/15/2024 11:00 | Meeting resumed |
This pattern of starting with an empty structure and incrementally adding rows is useful for building tables programmatically, such as logging events or accumulating results in an interactive application.
Both cols_add() and rows_add() integrate seamlessly with other gt functions. You can format, style, label, and otherwise manipulate the added columns and rows just like any other part of the table. This makes them powerful tools for keeping your table logic contained within the gt pipeline rather than scattered across data manipulation steps.
6.6 Summary
This chapter has covered the comprehensive set of functions that modify table structure at the column and row level. These functions transform raw data into polished, presentation-ready table elements.
The key capabilities we’ve explored:
- column labels:
cols_label()assigns human-readable labels, with support for Markdown, HTML, and units notation.cols_label_with()applies transformations programmatically across multiple columns. - column dimensions:
cols_width()controls column widths using pixels, percentages, or content-based sizing.cols_align()andcols_align_decimal()ensure proper alignment for readability. - column arrangement: the
cols_move*()family repositions columns, whilecols_hide()andcols_unhide()control visibility without removing data. - column merging:
cols_merge()combines multiple columns using flexible patterns. Specialized variants likecols_merge_uncert(),cols_merge_range(), andcols_merge_n_pct()handle common merge scenarios with appropriate formatting. - adding columns and rows:
cols_add()creates new columns from expressions, enabling calculated fields and derived values directly in the gt pipeline.rows_add()inserts new rows, useful for additional data points or spacer rows.
These structural modifications work alongside the formatting and styling functions you’ve already learned. A typical workflow might involve formatting values, then adjusting column labels, setting appropriate widths, reordering columns for logical flow, and merging related information.
The next chapter explores summary rows and columns, which extend table structure vertically and horizontally with computed aggregations. You’ll learn to add totals, averages, and other statistics that help readers understand your data at a glance.