5 Substitution and text transformation

After formatting functions have processed your data, two additional stages allow you to refine the presentation further: substitution and text transformation. These stages operate on formatted values, giving you fine-grained control over how specific values appear in your final table.

The three-stage rendering pipeline works as follows:

Formatting (fmt_*() functions) converts raw values to formatted strings
Substitution (sub_*() functions) replaces specific values with alternative text
Text transformation (text_*() functions) modifies the final string representation

Each stage builds on the previous one, allowing you to progressively refine your table’s presentation.

5.1 Substitution functions

Substitution functions replace specific values with alternative text or symbols. This is particularly useful for handling missing data, zero values, and extreme values that need special treatment.

5.1.1 `sub_missing()`

Missing values (NA) are common in real-world data, but displaying "NA" in a table is often undesirable. The sub_missing() function replaces missing values with more meaningful or visually appropriate text.

Here is the function’s signature:

sub_missing(
  data,
  columns = everything(),
  rows = everything(),
  missing_text = "---"
)

Let’s replace missing values in the exibble dataset with em-dashes:

exibble |>
  dplyr::select(num, char, currency) |>
  gt() |>
  sub_missing(columns = everything())

num	char	currency
1.111e-01	apricot	49.950
2.222e+00	banana	17.950
3.333e+01	coconut	1.390
4.444e+02	durian	65100.000
5.550e+03	—	1325.810
—	fig	13.255
7.770e+05	grapefruit	—
8.880e+06	honeydew	0.440

The NA values are now displayed as em-dashes (the "---" default for missing_text signifies an em dash), a common typographic convention for missing data that’s cleaner than displaying "NA".

You can customize the replacement text for different contexts:

exibble |>
  dplyr::select(num, char, currency) |>
  gt() |>
  sub_missing(columns = num, missing_text = "N/A") |>
  sub_missing(columns = char, missing_text = "(none)") |>
  sub_missing(columns = currency, missing_text = "---")

num	char	currency
1.111e-01	apricot	49.950
2.222e+00	banana	17.950
3.333e+01	coconut	1.390
4.444e+02	durian	65100.000
5.550e+03	(none)	1325.810
N/A	fig	13.255
7.770e+05	grapefruit	—
8.880e+06	honeydew	0.440

Different columns now display their missing values differently: numeric columns show "N/A", character columns show "(none)", and currency columns show an em-dash. This contextual approach can make your tables more intuitive.

The rows argument lets you target specific rows:

exibble |>
  dplyr::select(row, num, char) |>
  gt() |>
  sub_missing(
    columns = everything(),
    rows = row %in% c("row_1", "row_2"),
    missing_text = "pending"
  ) |>
  sub_missing(
    columns = everything(),
    missing_text = "---"
  )

row	num	char
row_1	1.111e-01	apricot
row_2	2.222e+00	banana
row_3	3.333e+01	coconut
row_4	4.444e+02	durian
row_5	5.550e+03	—
row_6	—	fig
row_7	7.770e+05	grapefruit
row_8	8.880e+06	honeydew

This applies different substitutions based on row context, with early rows showing "pending" and others showing an em-dash.

5.1.2 `sub_zero()`

Zero values sometimes represent “no data” or “not applicable” rather than a meaningful quantity. The sub_zero() function replaces zeros with alternative text.

Here is the function’s signature:

sub_zero(
  data,
  columns = everything(),
  rows = everything(),
  zero_text = "nil"
)

Let’s replace zero counts with a hyphen for clearer presentation:

dplyr::tibble(
  category = c("A", "B", "C", "D", "E"),
  count = c(42, 0, 17, 0, 8)
) |>
  gt() |>
  sub_zero(columns = count, zero_text = "–")

category	count
A	42
B	–
C	17
D	–
E	8

The zero counts are replaced with hyphens, making it immediately clear that these represent “none” rather than a meaningful zero value.

Different contexts might call for different zero representations:

dplyr::tibble(
  product = c("Widget A", "Widget B", "Widget C"),
  in_stock = c(150, 0, 75),
  on_order = c(0, 200, 0)
) |>
  gt() |>
  fmt_integer() |>
  sub_zero(columns = in_stock, zero_text = "Out of stock") |>
  sub_zero(columns = on_order, zero_text = "-")

product	in_stock	on_order
Widget A	150	-
Widget B	Out of stock	200
Widget C	75	-

The in_stock column shows a descriptive message for zeros, while on_order uses a simple dash. This helps readers quickly understand the significance of each zero.

5.1.3 `sub_small_vals()`

Very small values can be difficult to display meaningfully, especially when they approach the limits of measurement precision. The sub_small_vals() function replaces values below a threshold with “less than” statements.

Here is the function’s signature:

sub_small_vals(
  data,
  columns = everything(),
  rows = everything(),
  threshold = 0.01,
  small_pattern = "<{x}",
  sign = ""
)

Let’s apply this to concentration data where some measurements fall below the reliable detection limit. We’ll use fmt_partsper() to express concentrations in parts-per-million (ppm):

dplyr::tibble(
  measurement = c("Sample A", "Sample B", "Sample C", "Sample D"),
  concentration = c(0.5, 0.008, 0.0002, 0.15)
) |>
  gt() |>
  fmt_partsper(columns = concentration, to_units = "ppm") |>
  sub_small_vals(columns = concentration, threshold = 0.01)

measurement	concentration
Sample A	500,000.00 ppm
Sample B	<0.01
Sample C	<0.01
Sample D	150,000.00 ppm

Values below 0.01 are replaced with "<0.01", acknowledging their presence without implying false precision. This is standard practice in scientific and analytical reporting. The fmt_partsper() function formats the values with the appropriate "ppm" unit suffix.

The small_pattern argument customizes the replacement format. Here’s an example with parts-per-billion measurements, common in environmental testing:

dplyr::tibble(
  analyte = c("Lead", "Mercury", "Cadmium", "Arsenic"),
  level_ppb = c(2.5, 0.003, 0.0001, 1.2)
) |>
  gt() |>
  fmt_partsper(columns = level_ppb, to_units = "ppb") |>
  sub_small_vals(
    columns = level_ppb, 
    threshold = 0.01,
    small_pattern = "< LOD"
  )

analyte	level_ppb
Lead	2,500,000,000.00 ppb
Mercury	< LOD
Cadmium	< LOD
Arsenic	1,200,000,000.00 ppb

Here, values below the threshold are replaced with "< LOD" (i.e., below the limit of detection), which is a common term in analytical chemistry. The combination of fmt_partsper() with sub_small_vals() creates professional-looking scientific tables where detectable values show their ppb concentration and undetectable values display the appropriate notation.

Beyond scientific applications, sub_small_vals() is useful anywhere that very small numbers might confuse readers or imply misleading precision (such as rounding percentages near zero, financial reports with negligible amounts, or survey results with tiny response rates). The function helps maintain honesty about data precision while keeping tables readable.

5.1.4 `sub_large_vals()`

Similarly, extremely large values might need special handling, especially when they exceed a meaningful display range. The sub_large_vals() function caps values above a threshold.

Here is the function’s signature:

sub_large_vals(
  data,
  columns = everything(),
  rows = everything(),
  threshold = 1E12,
  large_pattern = ">={x}",
  sign = ""
)

Let’s create a table where we cap the display of values that exceed a meaningful threshold:

dplyr::tibble(
  metric = c("Normal", "High", "Very High", "Extreme"),
  value = c(500, 50000, 500000, 5000000)
) |>
  gt() |>
  fmt_number(columns = value) |>
  sub_large_vals(columns = value, threshold = 100000)

metric	value
Normal	500.00
High	50,000.00
Very High	≥1e+05
Extreme	≥1e+05

Values exceeding 100,000 are replaced with "≥1e+05" (scientific notation for the threshold), indicating they’re beyond the displayed scale. This is useful for tables where exact values above a certain point aren’t meaningful. Note that the threshold value in the replacement text uses scientific notation by default.

The large_pattern argument lets you customize the replacement text. This is particularly useful when scientific notation isn’t appropriate for your audience, or when domain-specific terminology (like “exceeds range” or “> max”) would be more meaningful:

dplyr::tibble(
  test = c("Control", "Test A", "Test B", "Test C"),
  response = c(85, 120, 9999, 145)
) |>
  gt() |>
  fmt_integer(columns = response) |>
  sub_large_vals(
    columns = response,
    threshold = 1000,
    large_pattern = "> max"
  )

test	response
Control	85
Test A	120
Test B	> max
Test C	145

Values above the threshold show “> max”, indicating they exceeded the measurement range. This approach is common in laboratory reports, sensor data displays, and any context where instruments have a defined upper limit.

The sub_large_vals() function is particularly useful for handling outliers that would otherwise distort a reader’s perception of the data, or for privacy purposes where exact large values shouldn’t be disclosed (e.g., displaying salaries above a threshold as “≥$200,000”). Combined with sub_small_vals(), you have complete control over how extreme values at both ends of the spectrum are presented.

5.1.5 `sub_values()`

For custom value replacements that go beyond missing, zero, or threshold-based substitutions, sub_values() provides flexible pattern matching. You can match specific values, use functions to identify cells for replacement, or work with vectors of values to replace. This is your go-to tool when the other substitution functions don’t cover your use case.

Here is the function’s signature:

sub_values(
  data,
  columns = everything(),
  rows = everything(),
  values = NULL,
  fn = NULL,
  replacement = NULL
)

There are several patterns for using this function, depending on whether you want to match exact values, apply conditional logic, or replace multiple values at once.

5.1.5.1 Replacing specific values

The simplest use of sub_values() is replacing exact value matches. You specify the values to look for and the replacement text to use. This is good when adding visual indicators or expanding abbreviations.

Let’s decorate status values with emoji indicators:

dplyr::tibble(
  status = c("active", "inactive", "pending", "archived"),
  count = c(10, 5, 3, 2)
) |>
  gt() |>
  sub_values(
    columns = status,
    values = "inactive",
    replacement = "⏸ inactive"
  ) |>
  sub_values(
    columns = status,
    values = "archived",
    replacement = "📁 archived"
  )

status	count
active	10
⏸ inactive	5
pending	3
📁 archived	2

The "inactive" and "archived" values now display with appropriate emoji indicators, making status categories immediately visually distinct. The "active" and "pending" values pass through unchanged since we didn’t specify replacements for them.

5.1.5.2 Using a function for conditional replacement

For more complex matching logic, you can supply a function via the fn argument. This function receives each cell value and should return TRUE for values that need replacement. This enables conditional logic that would be difficult to express with simple value matching.

Let’s flag scores that fall below an acceptable threshold:

dplyr::tibble(
  product = c("Alpha", "Beta", "Gamma", "Delta"),
  score = c(92, 45, 78, 38)
) |>
  gt() |>
  sub_values(
    columns = score,
    fn = function(x) x < 50,
    replacement = "⚠️ Below threshold"
  )

product	score
Alpha	92
Beta	⚠️ Below threshold
Gamma	78
Delta	⚠️ Below threshold

The function-based approach evaluates each cell value against the condition x < 50. Beta (45) and Delta (38) both fail this check, so they’re replaced with a warning indicator. This technique is invaluable for quality control dashboards or any table where certain values need to be called out based on thresholds.

5.1.5.3 Multiple value replacement

The values argument can accept a vector of values to match, all of which will be replaced with the same replacement text. This is useful for grouping related values or collapsing categories.

Let’s group weekend days together:

dplyr::tibble(
  day = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"),
  hours = c(8, 8, 8, 8, 8, 0, 0)
) |>
  gt() |>
  sub_values(
    columns = day,
    values = c("Sat", "Sun"),
    replacement = "Weekend"
  )

day	hours
Mon	8
Tue	8
Wed	8
Thu	8
Fri	8
Weekend	0
Weekend	0

Both Saturday and Sunday are replaced with "Weekend", effectively collapsing these two categories into one. This technique is helpful when detailed values in your data need to be presented at a higher level of abstraction.

The sub_values() function is the most versatile of the substitution functions. While sub_missing(), sub_zero(), sub_small_vals(), and sub_large_vals() each handle specific common cases, sub_values() handles everything else. Use it to expand abbreviations, add visual indicators to specific categories, flag values that meet custom criteria, or recode categorical data for clearer presentation. The function-based matching via fn is particularly powerful as you can implement any logic that returns TRUE or FALSE, from simple threshold checks to complex pattern matching or even lookups against external reference data.

5.2 Text transformation functions

Text transformation functions operate on the final string representation of cell values, after formatting and substitution have been applied. They provide powerful tools for pattern-based modifications and conditional text changes.

5.2.1 `text_replace()`

For targeted string replacements using regular expressions, text_replace() finds and replaces patterns within cell text. Unlike the substitution functions that replace entire cell values, text_replace() can modify specific portions of text within a cell while leaving the rest intact.

Here is the function’s signature:

text_replace(
  data,
  locations,
  pattern,
  replacement
)

Let’s standardize product descriptions by replacing a prefix:

dplyr::tibble(
  description = c(
    "Model ABC-123",
    "Model DEF-456", 
    "Model GHI-789"
  )
) |>
  gt() |>
  text_replace(
    locations = cells_body(columns = description),
    pattern = "Model ",
    replacement = "Type: "
  )

description
Type: ABC-123
Type: DEF-456
Type: GHI-789

All occurrences of "Model " are replaced with "Type: ", transforming the presentation without modifying the source data. This is particularly useful when you need to adapt terminology for different audiences or standardize inconsistent naming conventions.

5.2.1.1 Using regular expressions

The pattern argument supports full regular expression syntax, enabling powerful pattern matching and capture groups for sophisticated text manipulations.

Let’s restructure product codes to make variant identifiers more readable:

dplyr::tibble(
  code = c("SKU-001-A", "SKU-002-B", "SKU-003-C"),
  name = c("Widget", "Gadget", "Sprocket")
) |>
  gt() |>
  text_replace(
    locations = cells_body(columns = code),
    pattern = "-([A-Z])$",
    replacement = " (Variant \\1)"
  )

code	name
SKU-001 (Variant A)	Widget
SKU-002 (Variant B)	Gadget
SKU-003 (Variant C)	Sprocket

The regex pattern "-([A-Z])$" matches a hyphen followed by a capital letter at the end of the string. The parentheses create a capture group around the letter. The replacement string uses \\1 to reference that captured group, transforming "SKU-001-A" into "SKU-001 (Variant A)". This technique is invaluable for reformatting coded data into human-readable labels.

5.2.1.2 Applying to multiple locations

The locations argument accepts any of gt’s cells_*() helper functions, allowing you to target specific parts of the table. You can apply text replacements to the body, stub, column labels, or other table components.

Let’s shorten the row labels in the stub column:

exibble |>
  dplyr::select(row, char) |>
  dplyr::slice(1:4) |>
  gt(rowname_col = "row") |>
  text_replace(
    locations = cells_stub(),
    pattern = "row_",
    replacement = "R"
  )

	char
R1	apricot
R2	banana
R3	coconut
R4	durian

The transformation applies to the stub column, shortening row labels from "row_1" to "R1". This makes the table more compact while maintaining the row identification. The same technique works with cells_column_labels(), cells_title(), and other location helpers.

5.2.2 `text_case_when()`

For conditional text replacements based on complex logic, text_case_when() evaluates expressions and applies different replacements based on the results. This function mirrors the logic of dplyr::case_when() but operates on the text representation of cell values after formatting.

Here is the function’s signature:

text_case_when(
  data,
  .default = NULL,
  .locations = cells_body(),
  ...
)

The function uses a formula syntax where the left side is a condition and the right side is the replacement. The variable x represents the current cell value. Conditions are evaluated in order, and the first matching condition determines the replacement.

Let’s decorate letter grades with star indicators:

dplyr::tibble(
  student = c("Alice", "Bob", "Carol", "David", "Eve"),
  grade = c("A", "B", "A+", "C", "F")
) |>
  gt() |>
  text_case_when(
    x == "A+" ~ "★★ A+",
    x == "A" ~ "★ A",
    x == "F" ~ "✗ F",
    .default = NULL,
    .locations = cells_body(columns = grade)
  )

student	grade
Alice	★ A
Bob	B
Carol	★★ A+
David	C
Eve	✗ F

Different grades receive different decorations based on conditional logic: A+ gets double stars, A gets a single star, and F gets an X mark. The .default = NULL setting is important: it ensures unmatched values (B and C in this case) pass through unchanged rather than being replaced with a default value. Without this, unmatched values would display as NULL.

5.2.2.1 Numeric comparisons in text

Even though text transformations operate on strings (after formatting has been applied), you can extract numeric values and perform comparisons. This requires converting the text back to numbers, which may involve removing formatting characters like currency symbols or thousands separators.

Let’s add trend indicators based on the underlying values:

dplyr::tibble(
  metric = c("Revenue", "Costs", "Profit", "Tax"),
  value = c(100000, 75000, 25000, 5000)
) |>
  gt() |>
  fmt_currency(columns = value, currency = "USD") |>
  text_case_when(
    as.numeric(gsub("[^0-9.]", "", x)) >= 50000 ~ paste0("📈 ", x),
    as.numeric(gsub("[^0-9.]", "", x)) < 10000 ~ paste0("📉 ", x),
    .default = NULL,
    .locations = cells_body(columns = value)
  )

metric	value
Revenue	📈 $100,000.00
Costs	📈 $75,000.00
Profit	$25,000.00
Tax	📉 $5,000.00

The gsub("[^0-9.]", "", x) pattern strips everything except digits and decimal points, allowing as.numeric() to convert the formatted currency string back to a number. Values at or above $50,000 get an upward trend indicator, while values below $10,000 get a downward indicator. Values in between pass through unchanged. This technique lets you add visual cues based on numeric thresholds even after formatting has been applied.

5.2.2.2 Multiple conditions

You can chain as many conditions as needed. Each formula is evaluated in order, and the first matching condition wins.

Let’s transform terse status codes into user-friendly labels with appropriate icons:

dplyr::tibble(
  task = c("Design", "Development", "Testing", "Deployment"),
  status = c("done", "in_progress", "not_started", "blocked")
) |>
  gt() |>
  text_case_when(
    x == "done" ~ "✅ Complete",
    x == "in_progress" ~ "🔄 In Progress",
    x == "not_started" ~ "⏳ Pending",
    x == "blocked" ~ "🚫 Blocked",
    .default = NULL,
    .locations = cells_body(columns = status)
  )

task	status
Design	✅ Complete
Development	🔄 In Progress
Testing	⏳ Pending
Deployment	🚫 Blocked

Each status value is transformed into a more descriptive, emoji-decorated representation. This pattern is useful for project management dashboards, where coded statuses need to be presented in a visually scannable way. The emoji indicators provide instant recognition, while the expanded text ensures clarity.

5.2.3 `text_case_match()`

For simpler one-to-one replacements without complex conditions, text_case_match() provides a cleaner syntax similar to dplyr::case_match(). While text_case_when() allows arbitrary conditions, text_case_match() is optimized for direct value-to-replacement mappings.

Here is the function’s signature:

text_case_match(
  data,
  .default = NULL,
  .replace = NULL,
  .locations = cells_body(),
  ...
)

Let’s expand single-character availability codes into readable labels:

dplyr::tibble(
  feature = c("API", "Dashboard", "Reports", "Settings"),
  available = c("Y", "N", "Y", "P")
) |>
  gt() |>
  text_case_match(
    "Y" ~ "✓ Yes",
    "N" ~ "✗ No",
    "P" ~ "◐ Partial",
    .default = NULL,
    .locations = cells_body(columns = available)
  )

feature	available
API	✓ Yes
Dashboard	✗ No
Reports	✓ Yes
Settings	◐ Partial

Single-character codes are expanded to readable values with visual indicators. The syntax is more concise than text_case_when() because you don’t need to write x == for each condition. For simple value matching like this, text_case_match() is the cleaner choice.

5.2.3.1 Partial replacements with `.replace`

By default, text_case_match() replaces the entire cell value. Setting .replace = "partial" changes this behavior to only replace the matched portion, leaving the rest of the cell text intact.

Let’s replace a filename prefix with an icon while keeping the actual filename:

dplyr::tibble(
  item = c("file_report.pdf", "file_data.csv", "file_image.png"),
  size = c("2.5 MB", "150 KB", "4.1 MB")
) |>
  gt() |>
  text_case_match(
    "file_" ~ "📄 ",
    .replace = "partial",
    .locations = cells_body(columns = item)
  )

item	size
📄 report.pdf	2.5 MB
📄 data.csv	150 KB
📄 image.png	4.1 MB

Instead of replacing the entire cell value, only the matched portion ("file_") is replaced with the document emoji. The rest of the filename ("report.pdf", etc.) is preserved. This technique is excellent for adding visual indicators to coded prefixes or suffixes without losing the identifying information.

5.2.3.2 Multiple matches in sequence

You can specify multiple match patterns, each with its own replacement. When using .replace = "partial", all matching patterns are replaced within each cell.

Let’s replace country codes with flag emojis while keeping the city codes:

dplyr::tibble(
  code = c("US-CA", "UK-LN", "DE-BE", "FR-PA"),
  sales = c(15000, 12000, 9500, 11000)
) |>
  gt() |>
  text_case_match(
    "US" ~ "🇺🇸",
    "UK" ~ "🇬🇧", 
    "DE" ~ "🇩🇪",
    "FR" ~ "🇫🇷",
    .replace = "partial",
    .locations = cells_body(columns = code)
  ) |>
  fmt_currency(columns = sales, currency = "USD")

code	sales
🇺🇸-CA	$15,000.00
🇬🇧-LN	$12,000.00
🇩🇪-BE	$9,500.00
🇫🇷-PA	$11,000.00

The two-letter country codes at the beginning of each cell are replaced with their corresponding flag emojis, while the hyphen and city codes that follow are preserved. This creates visually appealing location identifiers that are both scannable (via the flags) and informative (via the city codes).

5.2.4 `text_transform()`

For maximum flexibility, text_transform() applies arbitrary functions to cell text. This is the most powerful (and most flexible) of the text transformation tools. While the other text functions provide convenient shortcuts for common patterns, text_transform() lets you write custom R functions that can perform any text manipulation.

Here is the function’s signature:

text_transform(
  data,
  locations,
  fn
)

The fn argument accepts a function that takes a character vector of cell values and returns a transformed character vector of the same length.

5.2.4.1 Basic transformations

Simple transformations can leverage existing R functions for text manipulation.

Let’s fix the capitalization of names stored in lowercase:

dplyr::tibble(
  name = c("john smith", "mary jones", "bob wilson")
) |>
  gt() |>
  text_transform(
    locations = cells_body(columns = name),
    fn = function(x) tools::toTitleCase(x)
  )

name
John Smith
Mary Jones
Bob Wilson

The names are transformed to title case using R’s built-in tools::toTitleCase() function. This corrects the formatting of the source data for display without modifying the original data. You could use any text function here: toupper(), tolower(), stringr::str_to_sentence(), or your own custom function.

5.2.4.2 Complex transformations

For more sophisticated manipulations, you can write multi-step transformation functions that parse, modify, and reassemble text.

Let’s restructure hyphenated product codes into a more readable format:

dplyr::tibble(
  product = c("widget-standard", "gadget-premium", "sprocket-basic"),
  price = c(29.99, 49.99, 19.99)
) |>
  gt() |>
  text_transform(
    locations = cells_body(columns = product),
    fn = function(x) {
      parts <- strsplit(x, "-")
      sapply(parts, function(p) {
        paste0(
          tools::toTitleCase(p[1]), 
          " (", 
          toupper(p[2]), 
          ")"
        )
      })
    }
  ) |>
  fmt_currency(columns = price, currency = "USD")

product	price
Widget (STANDARD)	$29.99
Gadget (PREMIUM)	$49.99
Sprocket (BASIC)	$19.99

This transformation function does several things: it splits each product string on the hyphen, capitalizes the product name using title case, converts the tier to uppercase, and reassembles them with the tier in parentheses. The result transforms "widget-standard" into "Widget (STANDARD)". This kind of structural reformatting would be difficult with the other text functions.

5.2.4.3 Conditional formatting with external data

The transformation function can access variables from the enclosing environment, allowing you to make decisions based on data that isn’t in the cell being transformed.

Let’s add performance indicators to region names based on whether they met their sales targets:

sales_data <- dplyr::tibble(
  region = c("North", "South", "East", "West"),
  sales = c(150000, 85000, 120000, 95000),
  target = c(100000, 100000, 100000, 100000)
)

sales_data |>
  gt() |>
  fmt_currency(columns = c(sales, target), currency = "USD") |>
  text_transform(
    locations = cells_body(columns = region),
    fn = function(x) {
      # Access the original data to make decisions
      ifelse(
        sales_data$sales > sales_data$target,
        paste0("✅ ", x),
        paste0("⚠️ ", x)
      )
    }
  )

region	sales	target
✅ North	$150,000.00	$100,000.00
⚠️ South	$85,000.00	$100,000.00
✅ East	$120,000.00	$100,000.00
⚠️ West	$95,000.00	$100,000.00

The transformation function references sales_data$sales and sales_data$target from outside the function to determine which indicator to add to each region name. North and East exceeded their targets (getting check marks), while South and West fell short (getting warning indicators). This technique is powerful for adding cross-column conditional formatting.

5.2.4.4 Transforming multiple columns together

You can apply the same transformation to multiple columns by specifying them in the locations argument.

Let’s complete partial email addresses by adding the domain:

dplyr::tibble(
  first = c("John", "Mary", "Bob"),
  last = c("Smith", "Jones", "Wilson"),
  email = c("jsmith", "mjones", "bwilson")
) |>
  gt() |>
  text_transform(
    locations = cells_body(columns = email),
    fn = function(x) paste0(x, "@company.com")
  )

first	last	email
John	Smith	jsmith@company.com
Mary	Jones	mjones@company.com
Bob	Wilson	bwilson@company.com

The email usernames are transformed into complete email addresses by appending the domain. This is a simple but practical example of how text_transform() can complete partial data for display purposes. You could extend this to apply different domains based on department, format phone numbers, or any other text augmentation task.

5.3 Combining stages effectively

The power of gt’s rendering pipeline comes from combining formatting, substitution, and transformation stages. Here’s a comprehensive example:

dplyr::tibble(
  product = c("alpha_widget", "beta_gadget", "gamma_sprocket", "delta_tool"),
  status = c("active", "discontinued", "active", "pending"),
  stock = c(150, 0, NA, 25),
  price = c(29.99, 19.99, 0.001, 49.99)
) |>
  gt() |>
  # Stage 1: Formatting
  fmt_integer(columns = stock) |>
  fmt_currency(columns = price, currency = "USD") |>
  # Stage 2: Substitution
  sub_missing(columns = stock, missing_text = "N/A") |>
  sub_zero(columns = stock, zero_text = "Out of stock") |>
  sub_small_vals(columns = price, threshold = 1, small_pattern = "Call for price") |>
  # Stage 3: Text transformation
  text_transform(
    locations = cells_body(columns = product),
    fn = function(x) {
      x <- gsub("_", " ", x)
      tools::toTitleCase(x)
    }
  ) |>
  text_case_match(
    "active" ~ "🟢 Active",
    "discontinued" ~ "🔴 Discontinued",
    "pending" ~ "🟡 Pending",
    .locations = cells_body(columns = status)
  )

product	status	stock	price
Alpha Widget	🟢 Active	150	$29.99
Beta Gadget	🔴 Discontinued	Out of stock	$19.99
Gamma Sprocket	🟢 Active	N/A	Call for price
Delta Tool	🟡 Pending	25	$49.99

This example demonstrates all three stages working together:

formatting: numbers are formatted as integers and currency
substitution: missing values, zeros, and small values receive special treatment
transformation: product names are cleaned up and status values are decorated

Each stage builds on the previous one, creating a polished final presentation from raw data.

5.4 Summary

This chapter has explored the second and third stages of gt’s rendering pipeline: substitution and text transformation. These functions operate after formatting, giving you fine-grained control over how specific values appear in your finished tables.

The key concepts to remember:

substitution functions (sub_*()) replace specific values with alternative text. Use sub_missing() for NA values, sub_zero() for zeros, sub_small_vals() and sub_large_vals() for extreme values, and sub_values() for custom replacement logic.
text transformation functions (text_*()) modify the final string representation of cell values. The text_transform() function provides maximum flexibility with custom functions, while text_case_when() and text_case_match() offer convenient pattern-based transformations.
the three-stage pipeline (format → substitute → transform) processes values in a defined order, ensuring predictable results. Each stage sees the output of the previous stage.
location targeting allows you to apply transformations precisely where needed, using the same cells_*() helper functions used throughout gt.

Together with the formatting functions from Chapters 3 and 4, you now have complete control over how every value in your table is rendered. Missing data can display meaningful placeholders. Extreme values can receive special treatment. Text can be cleaned, decorated, or transformed programmatically.

The next chapter shifts focus from cell values to table structure, exploring the cols_*() functions that let you modify columns themselves: their labels, widths, alignment, ordering, visibility, and even merging multiple columns into one.

5.1 Substitution functions

5.1.1 sub_missing()

5.1.2 sub_zero()

5.1.3 sub_small_vals()

5.1.4 sub_large_vals()

5.1.5 sub_values()

5.1.5.1 Replacing specific values

5.1.5.2 Using a function for conditional replacement

5.1.5.3 Multiple value replacement

5.2 Text transformation functions

5.2.1 text_replace()

5.2.1.1 Using regular expressions

5.2.1.2 Applying to multiple locations

5.2.2 text_case_when()

5.2.2.1 Numeric comparisons in text

5.2.2.2 Multiple conditions

5.2.3 text_case_match()

5.2.3.1 Partial replacements with .replace

5.2.3.2 Multiple matches in sequence

5.2.4 text_transform()

5.2.4.1 Basic transformations

5.2.4.2 Complex transformations

5.2.4.3 Conditional formatting with external data

5.2.4.4 Transforming multiple columns together

5.3 Combining stages effectively

5.4 Summary

5.1.1 `sub_missing()`

5.1.2 `sub_zero()`

5.1.3 `sub_small_vals()`

5.1.4 `sub_large_vals()`

5.1.5 `sub_values()`

5.2.1 `text_replace()`

5.2.2 `text_case_when()`

5.2.3 `text_case_match()`

5.2.3.1 Partial replacements with `.replace`

5.2.4 `text_transform()`