# DiagrammeR Docs

Get an overview of DiagrammeR, learn the syntax, check out some examples.

Skip to main content

Get an overview of DiagrammeR, learn the syntax, check out some examples.

Suppose you have a giant graph with many nodes and many edges. Moreover, the information contained within the nodes and edges is granular, highly specific, and perhaps obtained from data collection over varying periods of time. Chances are, if you've built up such a graph, you'll want to inspect parts of the graph (or get summaries of the entire graph) for reporting purposes or to inform further graph modification. This is where **DiagrammeR**'s inspection functions are helpful. They make it rather easy to get the information you need from various facets of the graph.

To get basic information on each the graph's nodes or edges, the `node_info()`

and `edge_info()`

functions can be used. These functions quickly return data frames with useful information for each node or edge.

The `node_info()`

and `edge_info()`

functions provide information about the nodes and edges in the graph. The information is presented in the form of a data frame. For `node_info()`

, the following data is returned:

`node`

`label`

`type`

`degree`

`indegree`

`outdegree`

`loops`

The `node`

column contains node ID values for each of the graph's nodes. The nodes' `label`

and `type`

values are optional yet useful. If these are not set for any of the nodes, the columns with still be present in the resultant data frame and the values contained will be empty strings. The node `degree`

, `indegree`

, and `outdegree`

are counts of how many edges are incident on a specific node. The `degree`

is determined by the total number of edges incident on the node regardless of the direction of the arrow (in a directed graph). The `indegree`

is considered as the number of edges with arrows directed toward the node whereas the `outdegree`

is such count with arrows directed away from the node. Thus, the sum of a node's `indegree`

and `outdegree`

values will be equal to the `degree`

value. The number of `loops`

indicates those edges that originate and terminate from the same node (the contribution to the `indegree`

and `outdegree`

will be `1`

and the contribution to the node's degree will be `2`

).

From the `edge_info()`

function, the resultant data frame has the following columns:

`from`

`to`

`rel`

`label`

```
###
# Get basic information on the graph's nodes and edges
###
library(DiagrammeR)
set.seed(26)
# Create an NDF
nodes <-
create_nodes(
nodes = LETTERS,
label = TRUE,
type = c(rep("a_to_g", 7),
rep("h_to_p", 9),
rep("q_to_x", 8),
rep("y_and_z",2)))
# Create an EDF
edges <-
create_edges(
from = sample(LETTERS, replace = TRUE),
to = sample(LETTERS, replace = TRUE),
label = "edge",
rel = "letter_to_letter")
# Create a graph object
graph <-
create_graph(
nodes_df = nodes,
edges_df = edges)
# Use the `node_info()` function, returning a data
# frame with information on the graph's nodes
node_info(graph)
#> node label type degree indegree outdegree loops
#> 1 A A a_to_g 2 0 2 0
#> 2 W W q_to_x 1 0 1 0
#> 3 T T q_to_x 2 0 2 0
#> 4 L L h_to_p 1 0 1 0
#> 5 F F a_to_g 0 0 0 0
#>.. ... ... ... ... ... ... ...
# Use the `edge_info()` function, returning a data
# frame with information on the graph's edges
edge_info(graph)
#> from to rel label
#> 1 A Z letter_to_letter edge
#> 2 H U letter_to_letter edge
#> 3 W O letter_to_letter edge
#> 4 U K letter_to_letter edge
#> 5 I V letter_to_letter edge
#>.. ... ... ... ...
```

The `node_present()`

and `edge_present()`

functions are used to determine whether a node (based on its node ID) or an edge (based on two node IDs) is present in a graph object. Both functions return a logical value of either `TRUE`

or `FALSE`

.

```
###
# Find out if a node or edge is present in the graph
###
library(DiagrammeR)
set.seed(26)
# Create an NDF
nodes <-
create_nodes(
nodes = LETTERS,
label = TRUE,
type = c(rep("a_to_g", 7),
rep("h_to_p", 9),
rep("q_to_x", 8),
rep("y_and_z",2)))
# Create an EDF
edges <-
create_edges(
from = sample(LETTERS, replace = TRUE),
to = sample(LETTERS, replace = TRUE),
label = "edge",
rel = "letter_to_letter")
# Create a graph object
graph <-
create_graph(
nodes_df = nodes,
edges_df = edges,)
# Verify that node with ID `a` is not in graph with
# the `node_present()` function (it won't be because
# the `LETTERS` vector is made up of capital letters)
node_present(graph, "a")
#> FALSE
# Is node with ID `A` in the graph?
node_present(graph, "A")
#> TRUE
# Are all node ID values from the LETTERS vector in
# the graph?
all(sapply(LETTERS, function(x) node_present(graph, x)))
#> TRUE
# Moving to the inspection of edges: is there any edge
# from node ID `A` to node ID `B`? Use the
# `edge_present()` function to find out
edge_present(graph, from = "A", to = "B")
#> FALSE
# Verify that there is an edge from node ID `K` to node
# ID `V`
edge_present(graph, from = "K", to = "V")
#> TRUE
```

The purpose of the `get_nodes()`

and `get_edges()`

functions is to return either all of the nodes or edges (i.e., pairs of nodes, ordered by direction) available in the graph, or, in data frames for nodes or edges. For `get_nodes()`

, one can simply supply either a graph object, a data frame for nodes, or a data frame for edges, and a vector of node IDs will be returned. For the `get_edges()`

function, there is an additional argument called `return_type`

, where you can specify three different types of return objects: a list with `return_type = list`

, a data frame with `return_type = df`

, and a character vector with `return_type = vector`

. Whereas `get_nodes()`

works with graph objects and data frames for nodes and edges, `get_edges()`

works only with graph objects and node data frames.

```
###
# Get a vector of all nodes in a graph, or in NDFs
# or EDFs
###
library("DiagrammeR")
set.seed(26)
# Create an NDF
nodes <-
create_nodes(
nodes = LETTERS,
label = TRUE,
type = c(rep("a_to_g", 7),
rep("h_to_p", 9),
rep("q_to_x", 8),
rep("y_and_z",2)))
# Create an EDF
edges <-
create_edges(
from = sample(LETTERS, replace = TRUE),
to = sample(LETTERS, replace = TRUE),
label = "edge",
rel = "letter_to_letter")
# Create a graph object
graph <-
create_graph(
nodes_df = nodes,
edges_df = edges)
# Use the `get_nodes()` function to return node ID
# values
get_nodes(graph)
#> [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L"
#> [13] "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X"
#> [25] "Y" "Z"
# Can extract a vector of node ID values from an NDF
table(get_nodes(nodes) %in% get_nodes(graph))
#>
#> TRUE
#> 26
# Can also extract a vector of node ID values from
# an EDF
table(get_nodes(graph) %in% get_nodes(edges))
#>
#> FALSE TRUE
#> 3 23
# Can get the 'outgoing' and 'incoming' node ID values
# in a list object
get_edges(graph, return_type = "list") # the default
#> [[1]]
#> [1] "A" "H" "W" "U" "I" "M" "U" "T" "I" "R" "O"
#> [12] "G" "O" "A" "V" "I" "M" "K" "R" "T" "Y" "R"
#> [23] "M" "L" "H" "V"
#> [[2]]
#> [1] "Z" "U" "O" "K" "V" "M" "N" "C" "D" "Z" "B"
#> [12] "G" "U" "Y" "H" "V" "R" "V" "Z" "S" "Q" "I"
#> [23] "P" "S" "E" "P"
# Similarly, you can specify that a data frame is given
get_edges(graph, return_type = "df")
#> from to
#> 1 A Z
#> 2 H U
#> 3 W O
#> 4 U K
#> 5 I V
#>.. ... ..
# A character string with node IDs can also be obtained
get_edges(graph, return_type = "vector")
#> [1] "A -> Z" "H -> U" "W -> O" "U -> K" "I -> V"
#> [6] "M -> M" "U -> N" "T -> C" "I -> D" "R -> Z"
#> [11] "O -> B" "G -> G" "O -> U" "A -> Y" "V -> H"
#> [16] "I -> V" "M -> R" "K -> V" "R -> Z" "T -> S"
#> [21] "Y -> Q" "R -> I" "M -> P" "L -> S" "H -> E"
#> [26] "V -> P"
# As with `get_nodes()`, the `get_edges()` function
# works in an analogous manner with EDFs
all(get_edges(edges, return_type = "list")[[1]] ==
get_edges(graph, return_type = "list")[[1]])
#> TRUE
all(get_edges(edges, return_type = "df") ==
get_edges(graph, return_type = "df"))
#> TRUE
all(get_edges(edges, return_type = "vector") ==
get_edges(graph, return_type = "vector"))
#> TRUE
```

The direct predecessors and direct successors of a particular node can be obtained through two easy-to-use functions: `get_predecessors()`

and `get_successors()`

. In order to be clear on what exactly these functions will return, a brief foray in graph theory will useful. Defining an edge between nodes as an arrow (with components `x`

, `y`

), such arrow is thought to be directed from `x`

to `y`

. The `y`

component is termed the *head* and `x`

is termed the *tail* of the arrow. Orientation is important as `y`

is considered to be a direct successor of `x`

and `x`

is considered to be a direct predecessor of `y`

. Supposing that `x`

and `y`

adjoin two different nodes in the graph, then this definition extended to mean direct predecessor nodes and direct successor nodes. There is the case that arrow components `x`

and `y`

are incident on the same node, and this is a node with a loop. In such a case, that node is both the direct predecessor and the direct successor of itself (where its degree is equal to 2). The aforementioned functions will return the node ID value(s) for the direct predecessors and direct successors of a given node. This is important to stress, since if a *path* exists between two distinct nodes, then the node at the end of the path is said to be a successor of node at the beginning of the path and *reachable* from that node (which is a predecessor of node at the end of the path.

The `get_predecessors()`

and `get_successors()`

functions take both a graph object and a specified node (provided as the node ID value) in that graph and they determine which nodes are its direct predecessors or successors, respectively. This is a more direct and convenient means of determining direct predecessor or successor node ID values than performing a traversal, or extracting the graph's EDF and using base **R** function to elucidate such node ID values.

```
###
# Get all the direct predecessors
# or all of the direct successors
# of a given node
###
library("DiagrammeR")
# Set a seed
set.seed(26)
# Create an NDF
nodes <-
create_nodes(
nodes = LETTERS,
label = TRUE,
type = c(rep("a_to_g", 7),
rep("h_to_p", 9),
rep("q_to_x", 8),
rep("y_and_z",2)))
# Create an EDF
edges <-
create_edges(
from = sample(LETTERS, TRUE),
to = sample(LETTERS, TRUE),
label = "edge",
rel = "letter_to_letter")
# Create a graph object
graph <-
create_graph(
nodes_df = nodes,
edges_df = edges)
# If there are no predecessors,
# `NA` is returned
get_predecessors(graph, node = "A")
#> [1] NA
get_successors(graph, node = "A")
#> [1] "Z" "Y"
get_successors(graph, node = "Z")
#> [1] NA
get_predecessors(graph, node = "Z")
#> [1] "A" "R" "R"
# Find isolated nodes in a graph
# (they have neither successors
# nor predecessors)
intersect(
names(
which(
is.na(
sapply(
get_nodes(graph),
function(x) get_successors(
graph, x))))),
names(
which(
is.na(
sapply(
get_nodes(graph),
function(x) get_predecessors(
graph, x)))))
)
#> [1] "F" "J" "X"
# The isolated nodes can also be
# found by subsetting the resulting
# data frame yielded by `node_info()`
node_info(graph)[which(node_info(graph)["degree"] == 0), ][, 1]
#> [1] "F" "J" "X"
```

Most graph objects should have an internal node data frame (*NDF*) and and internal edge data frame (*EDF*). The *NDF* represents the nodes and their attributes, and, the *EDF* represents the edges between nodes and the edge attributes. These can be directly accessed from the graph object using `[graph_name]$nodes_df`

or `[graph_name]$edges_df`

. A better way to do this is to use either the `get_node_df()`

or the `get_edge_df()`

function.

Both functions only require the graph object name as value for the `graph`

argument. Here are a few examples that show the use `get_node_df()`

and `get_edge_df()`

.

```
###
# Extract a graph's NDF and
# EDF and do worthwhile things
###
library(DiagrammeR)
library(magrittr)
# Show the NDF from a randomly
# created graph
create_random_graph(
5, 10,
directed = TRUE,
set_seed = 20) %>%
get_node_df
#> nodes type label value
#> 1 1 1 9
#> 2 2 2 8
#> 3 3 3 3
#> 4 4 4 5.5
#> 5 5 5 10
# Take this a step further and
# get the mean value from the
# `value` node attribute
create_random_graph(
5, 10,
directed = TRUE, set_seed = 20) %>%
get_node_df %>%
.$value %>%
as.numeric %>%
mean
#> [1] 7.1
# An empty graph doesn't have an
# NDF, so calling `get_node_df()`
# returns `NA`
create_graph() %>% get_node_df
#> [1] NA
# A graph with nodes but no edges
# likewise doesn't have an EDF
# so calling `get_edge_df()` will
# return `NA`
create_random_graph(5, 0) %>%
get_edge_df
#> [1] NA
# Getting the EDF from a graph
# is hardly different from getting
# an NDF. Get the 'head' of the
# graph's EDF
create_random_graph(
5, 10,
directed = TRUE, set_seed = 20) %>%
get_edge_df %>%
.[1:5,]
#> from to rel
#> 1 5 1
#> 2 1 3
#> 3 2 4
#> 4 4 1
#> 5 3 2
```

Likely, there will not be much use of these functions compared to others in the package but within the context of a **magrittr** statement the functions' names confer semantic information about what exactly is being done. If you need to build additional functions that extend those available in **DiagrammeR** then the use of these functions may be useful for such tasks.

Understanding the size of the graph is often important for EDA tasks. The graph size is partially attributable to the total number of nodes and the total number of edges. The functions `node_count()`

and `edge_count()`

both provide simple counts of all nodes and edges, respectively, in a graph. Furthermore, for property graphs where node `type`

attributes and edge `rel`

attributes are available, these functions can provide counts partitioned by the nodes or edges with different `type`

or `rel`

labels.

To get a count of all or certain types of nodes available in the graph, you can use the `node_count()`

function. The argument `type`

can either be supplied with a `TRUE`

or `FALSE`

value, or, a character vector containing the values for the node type which may be available for nodes in the graph. Providing `TRUE`

will issue a named vector of node counts by their type. Any nodes with a `type`

attribute not set with a value are placed into a separate count category. Using `type = FALSE`

with `node_count()`

simply supplies a single-value vector with a total count of nodes in the graph. By providing a vector of character values of available node `type`

values, a numerical named vector of counts for only those specified types will be returned.

```
###
# Get a count of all nodes in a graph
###
library("DiagrammeR")
set.seed(26)
# Create an NDF
nodes <-
create_nodes(
nodes = LETTERS,
label = TRUE,
type = c(rep("a_to_g", 7),
rep("h_to_p", 9),
rep("q_to_x", 8),
rep("y_and_z",2)))
# Create an EDF
edges <-
create_edges(
from = sample(LETTERS,
replace = TRUE),
to = sample(LETTERS,
replace = TRUE),
label = "edge",
rel = "letter_to_letter")
# Create a graph object
graph <-
create_graph(
nodes_df = nodes,
edges_df = edges,)
# Get counts of nodes grouped by
# the `type` attribute
node_count(graph, type = TRUE)
#> a_to_g h_to_p q_to_x y_and_z
#> 7 9 8 2
# Get a total count of nodes with
# no grouping
node_count(graph, type = FALSE)
#> [1] 26
```

Rather than accessing the graph object to determine whether the graph is empty or whether it is a directed graph, you can use the `is_graph_empty()`

or `is_graph_directed()`

function to return a logical value. This slightly improves code readability over using statements such as `is.null([graph]$nodes_df)`

or `[graph]$directed`

to get the same answer.

The `is_graph_empty()`

and `is_graph_directed()`

functions simply return either `TRUE`

or `FALSE`

for whether the graph is empty or whether the graph is a directed graph. These are likely to be most useful in verification statement for scripts that add and remove nodes from the graph, or, those scripts that may toggle the graph between directed and undirected states.

```
###
# Is the graph empty?
# Is it directed?
###
library("DiagrammeR")
library("magrittr")
# Create an empty graph
graph <- create_graph()
# Use the 'is_graph_empty' function
# to return a logical value
is_graph_empty(graph)
#> TRUE
# Add a node to the graph
graph %<>% add_node
# Now the function will return `FALSE`
# because there is a node in the graph
is_graph_empty(graph)
#> FALSE
# When created, graphs are set as
# `directed` by default; to verify
# that's the case here:
is_graph_directed(graph)
#> TRUE
```

```
```

There are several functions that create selections of nodes (e.g., `select_nodes()`

, `select_nodes_by_id`

, etc.) and several more that create selections of edges (e.g., `select_edges()`

, `select_last_edge()`

, etc.). To inspect the current selection of nodes or edges, you can use the `get_selection()`

function on a graph object.

The graph object itself stores any selections of nodes or edges. Therefore, the only argument to the `get_selection()`

function is `graph`

. Formally, if there is a selection of any type, it is stored as a `list`

object within `[graph]$selection`

. Should the selection be a node selection, then a vector of nodes will be available in `[graph]$selection$nodes`

. If the selection is an edge selection, there will be two accesible vectors: `[graph]$selection$edges$from`

and `[graph]$selection$edges$to`

. The `get_selection()`

function returns a list originating at `[graph]$selection`

. Thus if vectors are required, one still needs to access to appropriate list members (i.e., `$nodes`

for the selection of nodes and `$edges$from`

and `$edges$to`

for the selection of edges).

```
###
# Get the current selection
###
library("DiagrammeR")
library("magrittr")
# If there is no selection in the
# graph, `get_selection()` returns
# `NA`
create_graph() %>% get_selection
#> [1] NA
# Create a graph, add 5 nodes,
# select all nodes, then get the
# current selection as a list
create_graph() %>%
add_n_nodes(5) %>%
select_nodes %>%
get_selection
#> $nodes
#> [1] "1" "2" "3" "4" "5"
# Do the same as above except
# return the selection of nodes
# as a vector rather than a list
create_graph() %>%
add_n_nodes(5) %>%
select_nodes %>%
get_selection %>%
.$nodes
#> [1] "1" "2" "3" "4" "5"
# Create a graph, add a node,
# select that nodes, add 5 new
# nodes to node `1`
create_graph() %>%
add_n_nodes(1) %>%
select_nodes %>%
add_n_nodes_from_selection(5) %>%
select_edges_by_node_id(1:6) %>%
get_selection
#> $edges
#> $edges$from
#> [1] "1" "1" "1" "1" "1"
#>
#> $edges$to
#> [1] "2" "3" "4" "5" "6"
# With the magic of `magrittr`,
# print a character vector that
# schematically represents the
# selection of edges
create_graph() %>%
add_n_nodes(1) %>%
select_nodes %>%
add_n_nodes_from_selection(5) %>%
select_edges_by_node_id(1:6) %>%
get_selection %>%
{
from <- .$edges %>% .$from
to <- .$edges %>% .$to
combined <- paste(from, "->", to)
} %>% print
#> [1] "1 -> 2" "1 -> 3" "1 -> 4" "1 -> 5"
#> [5] "1 -> 6"
```

For a directed graph, a list of all possible paths from a between two nodes, or, to or from a given node can be obtained with the `get_paths()`

function. There are options to filter the list of paths returned to only the shortest paths, the longest paths, or to those paths within a given range of distances.

The `get_paths()`

gets information on possible traversal paths from a graph object supplied in `graph`

. Although the `from`

and `to`

arguments are formally optional argument that have default values of `NULL`

, at least one of those arguments must be supplied with a node ID value. In this way, a list of all paths either from a node or to a node will be returned as a list object. Providing both a node ID value for `from`

and for `to`

will return a list of all possible paths between the two different nodes.

```
###
# Get a selection of possible
# paths through a graph
###
library(DiagrammeR)
library(magrittr)
# Create a simple graph
graph <-
create_graph(graph_attrs =
"output = visNetwork") %>%
add_node_df(create_nodes(1:8)) %>%
add_edge(1, 2) %>% add_edge(1, 3) %>%
add_edge(3, 4) %>% add_edge(3, 5) %>%
add_edge(4, 6) %>% add_edge(2, 7) %>%
add_edge(7, 5) %>% add_edge(4, 8)
# View the graph
render_graph(graph)
# Get a list of all paths outward
# from node `1`
get_paths(graph, from = 1)
#> [[1]]
#> [1] "1" "3" "5"
#>
#> [[2]]
#> [1] "1" "2" "7" "5"
#>
#> [[3]]
#> [1] "1" "3" "4" "6"
#>
#> [[4]]
#> [1] "1" "3" "4" "8"
# Get a list of all paths leading
# to node `6`
get_paths(graph, to = 6)
#> [[1]]
#> [1] "4" "6"
#>
#> [[2]]
#> [1] "3" "4" "6"
#>
#> [[3]]
#> [1] "1" "3" "4" "6"
# Get a list of all paths
# from `1` to `5`
get_paths(graph,
from = 1, to = 5)
#> [[1]]
#> [1] "1" "3" "5"
#>
#> [[2]]
#> [1] "1" "2" "7" "5"
# Get a list of all paths from
# `1` up to a distance of 2
# node traversals
get_paths(graph,
from = 1, distance = 2)
#> [[1]]
#> [1] "1" "3" "5"
#>
#> [[2]]
#> [1] "1" "2" "7"
#>
#> [[3]]
#> [1] "1" "3" "4"
# Get a list of the shortest
# paths from `1` to `5`
get_paths(graph,
from = 1, to = 5,
shortest_path = TRUE)
#> [[1]]
#> [1] "1" "3" "5"
# Get a list of the longest
# paths from `1` to `5`
get_paths(graph,
from = 1, to = 5,
longest_path = TRUE)
#> [[1]]
#> [1] "1" "2" "7" "5"
# Use the overwhelming power of
# magrittr to color nodes in the
# longest path from `1` -> `5`
# green and all other nodes brown
graph %>%
select_nodes_by_id(
get_paths(., 1, 5,
longest_path = TRUE)[[1]]) %>%
set_node_attr_with_selection("color", "green") %>%
invert_selection %>%
set_node_attr_with_selection("color", "brown") %>%
render_graph
```