Inspection

Suppose you have a giant graph with many nodes and many edges. Moreover, the information contained within the nodes and edges is granular, highly specific, and perhaps obtained from data collection over varying periods of time. Chances are, if you've built up such a graph, you'll want to inspect parts of the graph (or get summaries of the entire graph) for reporting purposes or to inform further graph modification. This is where DiagrammeR's inspection functions are helpful. They make it rather easy to get the information you need from various facets of the graph.

Getting Summary Info on Nodes or Edges

To get basic information on each the graph's nodes or edges, the node_info() and edge_info() functions can be used. These functions quickly return data frames with useful information for each node or edge.

The node_info() and edge_info() functions provide information about the nodes and edges in the graph. The information is presented in the form of a data frame. For node_info(), the following data is returned:

  • node
  • label
  • type
  • degree
  • indegree
  • outdegree
  • loops

The node column contains node ID values for each of the graph's nodes. The nodes' label and type values are optional yet useful. If these are not set for any of the nodes, the columns with still be present in the resultant data frame and the values contained will be empty strings. The node degree, indegree, and outdegree are counts of how many edges are incident on a specific node. The degree is determined by the total number of edges incident on the node regardless of the direction of the arrow (in a directed graph). The indegree is considered as the number of edges with arrows directed toward the node whereas the outdegree is such count with arrows directed away from the node. Thus, the sum of a node's indegree and outdegree values will be equal to the degree value. The number of loops indicates those edges that originate and terminate from the same node (the contribution to the indegree and outdegree will be 1 and the contribution to the node's degree will be 2).

From the edge_info() function, the resultant data frame has the following columns:

  • from
  • to
  • rel
  • label
###
# Get basic information on the graph's nodes and edges
###

library(DiagrammeR)

set.seed(26)

# Create an NDF
nodes <-
  create_nodes(
    nodes = LETTERS,
    label = TRUE,
    type = c(rep("a_to_g", 7),
             rep("h_to_p", 9),
             rep("q_to_x", 8),
             rep("y_and_z",2)))

# Create an EDF
edges <-
  create_edges(
    from = sample(LETTERS, replace = TRUE),
    to = sample(LETTERS, replace = TRUE),
    label = "edge",
    rel = "letter_to_letter")

# Create a graph object
graph <-
  create_graph(
    nodes_df = nodes,
    edges_df = edges)

# Use the `node_info()` function, returning a data
# frame with information on the graph's nodes
node_info(graph)
#>    node label    type degree indegree outdegree loops
#> 1     A     A  a_to_g      2        0         2     0
#> 2     W     W  q_to_x      1        0         1     0
#> 3     T     T  q_to_x      2        0         2     0
#> 4     L     L  h_to_p      1        0         1     0
#> 5     F     F  a_to_g      0        0         0     0
#>..   ...   ...     ...    ...      ...       ...   ...

# Use the `edge_info()` function, returning a data
# frame with information on the graph's edges
edge_info(graph)
#>    from  to              rel label
#> 1     A   Z letter_to_letter  edge
#> 2     H   U letter_to_letter  edge
#> 3     W   O letter_to_letter  edge
#> 4     U   K letter_to_letter  edge
#> 5     I   V letter_to_letter  edge
#>..   ... ...              ...   ...

Determining Whether a Node or Edge is Present

The node_present() and edge_present() functions are used to determine whether a node (based on its node ID) or an edge (based on two node IDs) is present in a graph object. Both functions return a logical value of either TRUE or FALSE.

###
# Find out if a node or edge is present in the graph
###

library(DiagrammeR)

set.seed(26)

# Create an NDF
nodes <-
  create_nodes(
    nodes = LETTERS,
    label = TRUE,
    type = c(rep("a_to_g", 7),
             rep("h_to_p", 9),
             rep("q_to_x", 8),
             rep("y_and_z",2)))

# Create an EDF
edges <-
  create_edges(
    from = sample(LETTERS, replace = TRUE),
    to = sample(LETTERS, replace = TRUE),
    label = "edge",
    rel = "letter_to_letter")

# Create a graph object
graph <-
  create_graph(
    nodes_df = nodes,
    edges_df = edges,)

# Verify that node with ID `a` is not in graph with
# the `node_present()` function (it won't be because
# the `LETTERS` vector is made up of capital letters)
node_present(graph, "a")
#> FALSE

# Is node with ID `A` in the graph?
node_present(graph, "A")
#> TRUE

# Are all node ID values from the LETTERS vector in
# the graph?
all(sapply(LETTERS, function(x) node_present(graph, x)))
#> TRUE

# Moving to the inspection of edges: is there any edge
# from node ID `A` to node ID `B`? Use the
# `edge_present()` function to find out
edge_present(graph, from = "A", to = "B")
#> FALSE

# Verify that there is an edge from node ID `K` to node
# ID `V`
edge_present(graph, from = "K", to = "V")
#> TRUE

Getting a Collection of Nodes or Edges

The purpose of the get_nodes() and get_edges() functions is to return either all of the nodes or edges (i.e., pairs of nodes, ordered by direction) available in the graph, or, in data frames for nodes or edges. For get_nodes(), one can simply supply either a graph object, a data frame for nodes, or a data frame for edges, and a vector of node IDs will be returned. For the get_edges() function, there is an additional argument called return_type, where you can specify three different types of return objects: a list with return_type = list, a data frame with return_type = df, and a character vector with return_type = vector. Whereas get_nodes() works with graph objects and data frames for nodes and edges, get_edges() works only with graph objects and node data frames.

###
# Get a vector of all nodes in a graph, or in NDFs
# or EDFs
###

library("DiagrammeR")

set.seed(26)

# Create an NDF
nodes <-
  create_nodes(
    nodes = LETTERS,
    label = TRUE,
    type = c(rep("a_to_g", 7),
             rep("h_to_p", 9),
             rep("q_to_x", 8),
             rep("y_and_z",2)))

# Create an EDF
edges <-
  create_edges(
    from = sample(LETTERS, replace = TRUE),
    to = sample(LETTERS, replace = TRUE),
    label = "edge",
    rel = "letter_to_letter")

# Create a graph object
graph <-
  create_graph(
    nodes_df = nodes,
    edges_df = edges)

# Use the `get_nodes()` function to return node ID
# values
get_nodes(graph)
#> [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L"
#> [13] "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X"
#> [25] "Y" "Z"

# Can extract a vector of node ID values from an NDF
table(get_nodes(nodes) %in% get_nodes(graph))
#>
#> TRUE
#>   26

# Can also extract a vector of node ID values from
# an EDF
table(get_nodes(graph) %in% get_nodes(edges))
#>
#> FALSE  TRUE
#>     3    23

# Can get the 'outgoing' and 'incoming' node ID values
# in a list object
get_edges(graph, return_type = "list") # the default
#> [[1]]
#>  [1] "A" "H" "W" "U" "I" "M" "U" "T" "I" "R" "O"
#> [12] "G" "O" "A" "V" "I" "M" "K" "R" "T" "Y" "R"
#> [23] "M" "L" "H" "V"

#> [[2]]
#>  [1] "Z" "U" "O" "K" "V" "M" "N" "C" "D" "Z" "B"
#> [12] "G" "U" "Y" "H" "V" "R" "V" "Z" "S" "Q" "I"
#> [23] "P" "S" "E" "P"

# Similarly, you can specify that a data frame is given
get_edges(graph, return_type = "df")
#>    from to
#> 1     A  Z
#> 2     H  U
#> 3     W  O
#> 4     U  K
#> 5     I  V
#>..   ... ..

# A character string with node IDs can also be obtained
get_edges(graph, return_type = "vector")
#>  [1] "A -> Z" "H -> U" "W -> O" "U -> K" "I -> V"
#>  [6] "M -> M" "U -> N" "T -> C" "I -> D" "R -> Z"
#> [11] "O -> B" "G -> G" "O -> U" "A -> Y" "V -> H"
#> [16] "I -> V" "M -> R" "K -> V" "R -> Z" "T -> S"
#> [21] "Y -> Q" "R -> I" "M -> P" "L -> S" "H -> E"
#> [26] "V -> P"

# As with `get_nodes()`, the `get_edges()` function
# works in an analogous manner with EDFs

all(get_edges(edges, return_type = "list")[[1]] ==
      get_edges(graph, return_type = "list")[[1]])
#> TRUE

all(get_edges(edges, return_type = "df") ==
      get_edges(graph, return_type = "df"))
#> TRUE

all(get_edges(edges, return_type = "vector") ==
      get_edges(graph, return_type = "vector"))
#> TRUE

Getting the Direct Predecessors or Successors of a Node

The direct predecessors and direct successors of a particular node can be obtained through two easy-to-use functions: get_predecessors() and get_successors(). In order to be clear on what exactly these functions will return, a brief foray in graph theory will useful. Defining an edge between nodes as an arrow (with components x, y), such arrow is thought to be directed from x to y. The y component is termed the head and x is termed the tail of the arrow. Orientation is important as y is considered to be a direct successor of x and x is considered to be a direct predecessor of y. Supposing that x and y adjoin two different nodes in the graph, then this definition extended to mean direct predecessor nodes and direct successor nodes. There is the case that arrow components x and y are incident on the same node, and this is a node with a loop. In such a case, that node is both the direct predecessor and the direct successor of itself (where its degree is equal to 2). The aforementioned functions will return the node ID value(s) for the direct predecessors and direct successors of a given node. This is important to stress, since if a path exists between two distinct nodes, then the node at the end of the path is said to be a successor of node at the beginning of the path and reachable from that node (which is a predecessor of node at the end of the path.

The get_predecessors() and get_successors() functions take both a graph object and a specified node (provided as the node ID value) in that graph and they determine which nodes are its direct predecessors or successors, respectively. This is a more direct and convenient means of determining direct predecessor or successor node ID values than performing a traversal, or extracting the graph's EDF and using base R function to elucidate such node ID values.

###
# Get all the direct predecessors
# or all of the direct successors
# of a given node
###

library("DiagrammeR")

# Set a seed
set.seed(26)

# Create an NDF
nodes <-
  create_nodes(
    nodes = LETTERS,
    label = TRUE,
    type = c(rep("a_to_g", 7),
             rep("h_to_p", 9),
             rep("q_to_x", 8),
             rep("y_and_z",2)))

# Create an EDF
edges <-
  create_edges(
    from = sample(LETTERS, TRUE),
    to = sample(LETTERS, TRUE),
    label = "edge",
    rel = "letter_to_letter")

# Create a graph object
graph <-
  create_graph(
    nodes_df = nodes,
    edges_df = edges)

# If there are no predecessors,
# `NA` is returned
get_predecessors(graph, node = "A")
#> [1] NA

get_successors(graph, node = "A")
#> [1] "Z" "Y"

get_successors(graph, node = "Z")
#> [1] NA

get_predecessors(graph, node = "Z")
#> [1] "A" "R" "R"

# Find isolated nodes in a graph
# (they have neither successors
# nor predecessors)
intersect(
  names(
    which(
      is.na(
        sapply(
          get_nodes(graph),
            function(x) get_successors(
              graph, x))))),
  names(
    which(
      is.na(
        sapply(
          get_nodes(graph),
            function(x) get_predecessors(
              graph, x)))))
)
#> [1] "F" "J" "X"

# The isolated nodes can also be
# found by subsetting the resulting
# data frame yielded by `node_info()`
node_info(graph)[which(node_info(graph)["degree"] == 0), ][, 1]
#> [1] "F" "J" "X"

Extracting NDFs or EDFs from a Graph

Most graph objects should have an internal node data frame (NDF) and and internal edge data frame (EDF). The NDF represents the nodes and their attributes, and, the EDF represents the edges between nodes and the edge attributes. These can be directly accessed from the graph object using [graph_name]$nodes_df or [graph_name]$edges_df. A better way to do this is to use either the get_node_df() or the get_edge_df() function.

Both functions only require the graph object name as value for the graph argument. Here are a few examples that show the use get_node_df() and get_edge_df().

###
# Extract a graph's NDF and
# EDF and do worthwhile things
###

library(DiagrammeR)
library(magrittr)

# Show the NDF from a randomly
# created graph
create_random_graph(
  5, 10,
  directed = TRUE,
  set_seed = 20) %>%
  get_node_df
#>   nodes type label value
#> 1     1          1     9
#> 2     2          2     8
#> 3     3          3     3
#> 4     4          4   5.5
#> 5     5          5    10

# Take this a step further and
# get the mean value from the
# `value` node attribute
create_random_graph(
  5, 10,
  directed = TRUE, set_seed = 20) %>%
  get_node_df %>%
  .$value %>%
  as.numeric %>%
  mean
#> [1] 7.1

# An empty graph doesn't have an
# NDF, so calling `get_node_df()`
# returns `NA`
create_graph() %>% get_node_df
#> [1] NA

# A graph with nodes but no edges
# likewise doesn't have an EDF
# so calling `get_edge_df()` will
# return `NA`
create_random_graph(5, 0) %>%
  get_edge_df
#> [1] NA

# Getting the EDF from a graph
# is hardly different from getting
# an NDF. Get the 'head' of the
# graph's EDF
create_random_graph(
  5, 10,
  directed = TRUE, set_seed = 20) %>%
  get_edge_df %>%
  .[1:5,]
#> from to rel
#> 1    5  1
#> 2    1  3
#> 3    2  4
#> 4    4  1
#> 5    3  2   

Likely, there will not be much use of these functions compared to others in the package but within the context of a magrittr statement the functions' names confer semantic information about what exactly is being done. If you need to build additional functions that extend those available in DiagrammeR then the use of these functions may be useful for such tasks.

Getting Counts of Nodes or Edges

Understanding the size of the graph is often important for EDA tasks. The graph size is partially attributable to the total number of nodes and the total number of edges. The functions node_count() and edge_count() both provide simple counts of all nodes and edges, respectively, in a graph. Furthermore, for property graphs where node type attributes and edge rel attributes are available, these functions can provide counts partitioned by the nodes or edges with different type or rel labels.

To get a count of all or certain types of nodes available in the graph, you can use the node_count() function. The argument type can either be supplied with a TRUE or FALSE value, or, a character vector containing the values for the node type which may be available for nodes in the graph. Providing TRUE will issue a named vector of node counts by their type. Any nodes with a type attribute not set with a value are placed into a separate count category. Using type = FALSE with node_count() simply supplies a single-value vector with a total count of nodes in the graph. By providing a vector of character values of available node type values, a numerical named vector of counts for only those specified types will be returned.

###
# Get a count of all nodes in a graph
###

library("DiagrammeR")

set.seed(26)

# Create an NDF
nodes <-
  create_nodes(
    nodes = LETTERS,
    label = TRUE,
    type = c(rep("a_to_g", 7),
             rep("h_to_p", 9),
             rep("q_to_x", 8),
             rep("y_and_z",2)))

# Create an EDF
edges <-
  create_edges(
    from = sample(LETTERS,
                  replace = TRUE),
    to = sample(LETTERS,
                replace = TRUE),
    label = "edge",
    rel = "letter_to_letter")

# Create a graph object
graph <-
  create_graph(
    nodes_df = nodes,
    edges_df = edges,)

# Get counts of nodes grouped by
# the `type` attribute
node_count(graph, type = TRUE)
#> a_to_g  h_to_p  q_to_x y_and_z
#>      7       9       8       2

# Get a total count of nodes with
# no grouping
node_count(graph, type = FALSE)
#> [1] 26

Is the Graph Empty? Directed?

Rather than accessing the graph object to determine whether the graph is empty or whether it is a directed graph, you can use the is_graph_empty() or is_graph_directed() function to return a logical value. This slightly improves code readability over using statements such as is.null([graph]$nodes_df) or [graph]$directed to get the same answer.

The is_graph_empty() and is_graph_directed() functions simply return either TRUE or FALSE for whether the graph is empty or whether the graph is a directed graph. These are likely to be most useful in verification statement for scripts that add and remove nodes from the graph, or, those scripts that may toggle the graph between directed and undirected states.

###
# Is the graph empty?
# Is it directed?
###

library("DiagrammeR")
library("magrittr")

# Create an empty graph
graph <- create_graph()

# Use the 'is_graph_empty' function
# to return a logical value
is_graph_empty(graph)
#> TRUE

# Add a node to the graph
graph %<>% add_node

# Now the function will return `FALSE`
# because there is a node in the graph
is_graph_empty(graph)
#> FALSE

# When created, graphs are set as
# `directed` by default; to verify
# that's the case here:
is_graph_directed(graph)
#> TRUE

Getting the Graph's Global Attributes


Getting the Current Selection of Nodes/Edges

There are several functions that create selections of nodes (e.g., select_nodes(), select_nodes_by_id, etc.) and several more that create selections of edges (e.g., select_edges(), select_last_edge(), etc.). To inspect the current selection of nodes or edges, you can use the get_selection() function on a graph object.

The graph object itself stores any selections of nodes or edges. Therefore, the only argument to the get_selection() function is graph. Formally, if there is a selection of any type, it is stored as a list object within [graph]$selection. Should the selection be a node selection, then a vector of nodes will be available in [graph]$selection$nodes. If the selection is an edge selection, there will be two accesible vectors: [graph]$selection$edges$from and [graph]$selection$edges$to. The get_selection() function returns a list originating at [graph]$selection. Thus if vectors are required, one still needs to access to appropriate list members (i.e., $nodes for the selection of nodes and $edges$from and $edges$to for the selection of edges).

###
# Get the current selection
###

library("DiagrammeR")
library("magrittr")

# If there is no selection in the
# graph, `get_selection()` returns
# `NA`
create_graph() %>% get_selection
#> [1] NA

# Create a graph, add 5 nodes,
# select all nodes, then get the
# current selection as a list
create_graph() %>%
  add_n_nodes(5) %>%
  select_nodes %>%
  get_selection
#> $nodes
#> [1] "1" "2" "3" "4" "5"

# Do the same as above except
# return the selection of nodes
# as a vector rather than a list
create_graph() %>%
  add_n_nodes(5) %>%
  select_nodes %>%
  get_selection %>%
  .$nodes
#> [1] "1" "2" "3" "4" "5"

# Create a graph, add a node,
# select that nodes, add 5 new
# nodes to node `1`
create_graph() %>%
  add_n_nodes(1) %>%
  select_nodes %>%
  add_n_nodes_from_selection(5) %>%
  select_edges_by_node_id(1:6) %>%
  get_selection
#> $edges
#> $edges$from
#> [1] "1" "1" "1" "1" "1"
#>
#> $edges$to
#> [1] "2" "3" "4" "5" "6"

# With the magic of `magrittr`,
# print a character vector that
# schematically represents the
# selection of edges
create_graph() %>%
  add_n_nodes(1) %>%
  select_nodes %>%
  add_n_nodes_from_selection(5) %>%
  select_edges_by_node_id(1:6) %>%
  get_selection %>%
  {
    from <- .$edges %>% .$from
    to <- .$edges %>% .$to
    combined <- paste(from, "->", to)
  } %>% print
#> [1] "1 -> 2" "1 -> 3" "1 -> 4" "1 -> 5"
#> [5] "1 -> 6"

Obtaining Path Information from a Graph

For a directed graph, a list of all possible paths from a between two nodes, or, to or from a given node can be obtained with the get_paths() function. There are options to filter the list of paths returned to only the shortest paths, the longest paths, or to those paths within a given range of distances.

The get_paths() gets information on possible traversal paths from a graph object supplied in graph. Although the from and to arguments are formally optional argument that have default values of NULL, at least one of those arguments must be supplied with a node ID value. In this way, a list of all paths either from a node or to a node will be returned as a list object. Providing both a node ID value for from and for to will return a list of all possible paths between the two different nodes.

###
# Get a selection of possible
# paths through a graph
###

library(DiagrammeR)
library(magrittr)

# Create a simple graph
graph <-
  create_graph(graph_attrs =
                 "output = visNetwork") %>%
  add_node_df(create_nodes(1:8)) %>%
  add_edge(1, 2) %>% add_edge(1, 3) %>%
  add_edge(3, 4) %>% add_edge(3, 5) %>%
  add_edge(4, 6) %>% add_edge(2, 7) %>%
  add_edge(7, 5) %>% add_edge(4, 8)

# View the graph
render_graph(graph)



# Get a list of all paths outward
# from node `1`
get_paths(graph, from = 1)
#> [[1]]
#> [1] "1" "3" "5"
#>
#> [[2]]
#> [1] "1" "2" "7" "5"
#>
#> [[3]]
#> [1] "1" "3" "4" "6"
#>
#> [[4]]
#> [1] "1" "3" "4" "8"

# Get a list of all paths leading
# to node `6`
get_paths(graph, to = 6)
#> [[1]]
#> [1] "4" "6"
#>
#> [[2]]
#> [1] "3" "4" "6"
#>
#> [[3]]
#> [1] "1" "3" "4" "6"

# Get a list of all paths
# from `1` to `5`
get_paths(graph,
          from = 1, to = 5)
#> [[1]]
#> [1] "1" "3" "5"
#>
#> [[2]]
#> [1] "1" "2" "7" "5"

# Get a list of all paths from
# `1` up to a distance of 2
# node traversals
get_paths(graph,
          from = 1, distance = 2)
#> [[1]]
#> [1] "1" "3" "5"
#>
#> [[2]]
#> [1] "1" "2" "7"
#>
#> [[3]]
#> [1] "1" "3" "4"

# Get a list of the shortest
# paths from `1` to `5`
get_paths(graph,
          from = 1, to = 5,
          shortest_path = TRUE)
#> [[1]]
#> [1] "1" "3" "5"

# Get a list of the longest
# paths from `1` to `5`
get_paths(graph,
          from = 1, to = 5,
          longest_path = TRUE)
#> [[1]]
#> [1] "1" "2" "7" "5"

# Use the overwhelming power of
# magrittr to color nodes in the
# longest path from `1` -> `5`
# green and all other nodes brown
graph %>%
  select_nodes_by_id(
    get_paths(., 1, 5,
              longest_path = TRUE)[[1]]) %>%
  set_node_attr_with_selection("color", "green") %>%
  invert_selection %>%
  set_node_attr_with_selection("color", "brown") %>%
  render_graph