Traversals

Imagine diving into a graph and moving across the graph's nodes, jumping onto an edge, perhaps bypassing those edges and simply alighting to different nodes with specific attributes. Traversals are quite important as part of a graph query. You can develop sophisticated pipelines that allow for selective movement across the graph (based on conditions you specify per traversal) and the gleaning of information from nodes and edges. Importantly, traversals begin with selections of nodes or edges and the act of traversing modifies the selection of nodes or edges. One may select a single node, for instance, perform one or more traversals away from that initial node, and perhaps create a selection of several different nodes (or even edges). There are many important use cases, so, an in-depth primer of DiagrammeR's traversal functions is provided alongside numerous practical examples.

Traversals Across Nodes

To traverse across connected nodes without regard to the properties of the edges between the nodes, three functions are available: trav_out(), trav_in(), and trav_both(). These types of traversals always require an initial selection of one or more nodes, and, after traversing, a selection of one or more nodes is returned.

Directionality of the traversal is the key differentiator between these three functions. The trav_out() function allows for traversals to connected nodes that are outbound nodes in relation to the origin nodes (in a directed graph). With the trav_in() function, the movement is reversed: traversals towards connected nodes are to inbound nodes. For example, take the edge described by 1->2 and the origin node is the node with ID 1; the trav_out() function would change the node selection from node 1 to node 2 because these nodes are adjacent to each other and the edge leads from the origin node to an outbound node. If node 1 has outbound edges to other nodes (e.g., 1->{2,3,4}) then all of those nodes connected to outbound edges of the origin node will be part of the new selection. Take another example with a central node as the selected node, and that node has both outbound and inbound edges to adjacent nodes: {2,3,4}->1->{5,6,7}. Should the function trav_in() be used, then nodes 2, 3, and 4 will become the selected nodes; using trav_out() will result in nodes 5, 6, and 7 becoming the selected nodes. Here are several examples of traversals across nodes.

###
# Perform two types of traversals from
# a single node using `trav_out()` and
# `trav_in()`
###

library(DiagrammeR)
library(magrittr)

# Create a simple graph with two nodes, an edge
# between them (`1` -> `2`); starting from node
# `1` (as a selection), traverse to node `2`
# and then obtain the current selection
create_graph() %>%
  add_node %>%
  add_node %>%
  add_edge(1, 2) %>%
  select_nodes_by_id(1) %>%
  trav_out %>%
  get_selection
#> [1] "2"

# If no traversal can occur, the selection is not
# altered. To demonstrate, use a similar pipeline
# but reverse the edge direction
create_graph() %>%
  add_node %>%
  add_node %>%
  add_edge(2, 1) %>%
  select_nodes_by_id(1) %>%
  trav_out %>%
  get_selection
#> [1] "1"

# A traversal can occur if `trav_in()` is used
# instead of `trav_out()`
create_graph() %>%
  add_node %>%
  add_node %>%
  add_edge(2, 1) %>%
  select_nodes_by_id(1) %>%
  trav_in %>%
  get_selection
#> [1] "2"

# Multiple traversals can be made in a single
# magrittr pipeline
create_graph() %>%
  add_n_nodes(5) %>%
  add_edge(1, 2) %>%
  add_edge(2, 3) %>%
  add_edge(3, 4) %>%
  add_edge(4, 5) %>%
  select_nodes_by_id(1) %>%
  trav_out %>%
  trav_out %>%
  trav_out %>%
  trav_out %>%
  get_selection
#> [1] "5"

# A selection of multiple nodes can occur as
# a result of a traversal
create_graph() %>%
  add_node %>%
  select_nodes_by_id(1) %>%
  add_n_nodes_ws(10, "from") %>%
  add_n_nodes_ws(10, "to") %>%
  trav_out %>%
  get_selection
#>  [1] "2"  "3"  "4"  "5"  "6"  "7"  "8"
#>  [8] "9"  "10" "11"

create_graph() %>%
  add_node %>%
  select_nodes_by_id(1) %>%
  add_n_nodes_ws(10, "from") %>%
  add_n_nodes_ws(10, "to") %>%
  trav_in %>%
  get_selection
#>  [1] "12" "13" "14" "15" "16" "17"
#>  [7] "18" "19" "20" "21"

The trav_both() function results in traversals to adjacent nodes regardless of the edge directions between those nodes. So, in a sense, the direction of movement to adjacent nodes is both in and out, or, both. For the example of {2,3,4}->1->{5,6,7}, where node 1 is the only node in the selection, all of nodes 2 through to node 6 will be part of the new selection after calling trav_both().

###
# Perform traversals from a single
# node using `trav_both()`
###

library(DiagrammeR)
library(magrittr)

# Create the graph described in the paragraph
# above ({`2...4`} -> `1` -> {`5...7`}),
# start from node `1` (as a selection),
# traverse to all other adjacent nodes and
# then obtain the current selection
create_graph() %>%
  add_node %>%
  select_nodes_by_id(1) %>%
  add_n_nodes_ws(3, "to") %>%
  add_n_nodes_ws(3, "from") %>%
  trav_both %>%
  get_selection
#> [1] "5" "6" "7" "2" "3" "4"

So far, these functions are described as modifying selections of nodes based solely on node adjacency and the direction of the edges between the adjacent nodes. Indeed without supplying values to the function, traversals occur without regard to the attributes of the nodes traversed to. However, the arguments node_attr and match are available for filtering the traversals to those that satisfy logical statements on numeric attributes or matches on character attributes. For a property graph, where values are available for all nodes' type attribute and all edges' rel attribute, a traversal with trav_out() could, for example, be performed for all outbound, adjacent nodes that have a specific type label. This is done by setting node_attr = type and providing the value of that type for the match argument.

###
# Perform traversals with conditions
# based on node `type` values
###

library(DiagrammeR)
library(magrittr)

# Create a common graph with nodes having
# various `type` values; set to render
# always using `visNetwork` when calling
# `render_graph()`
graph <-
  create_graph() %>%
  set_global_graph_attrs(
    "graph", "output", "visNetwork") %>%
  add_node("type_a", FALSE) %>%
  add_n_nodes(4, "type_b") %>%
  add_edge(1, 2) %>%
  add_edge(1, 3) %>%
  add_edge(4, 1) %>%
  add_edge(5, 1) %>%
  add_n_nodes(4, "type_c") %>%
  add_edge(1, 6) %>%
  add_edge(1, 7) %>%
  add_edge(8, 1) %>%
  add_edge(9, 1)

# View the created graph
render_graph(graph)



graph %>%
  select_nodes_by_id(1) %>%
  trav_out %>%
  get_selection

graph %>%
  select_nodes_by_id(1) %>%
  trav_out("type", "type_b") %>%
  get_selection

graph %>% select_nodes_by_id(1) %>%
  trav_out("type", "type_c") %>%
  get_selection

# Once the nodes have been selected via
# a traversal, a useful thing to do would
# be to attach new nodes to that selection
updated_graph <-
  graph %>%
  select_nodes_by_id(1) %>%
  trav_out("type", "type_c") %>%
  add_n_nodes_ws(1, "from", "type_d")

# View the updated graph
render_graph(updated_graph)

We are not limited to starting a traversal from a single node ID value, we can begin from a selection of nodes based on a regular expression and traverse to a matching type string value (or to other node attributes that have character values). The following example uses a random graph of food entities with arbitrary edges between them.

###
# Perform traversals with conditions
# based on node `label` regex matches
###

library(DiagrammeR)
library(magrittr)

# Create a graph with fruit, vegetables,
# and nuts
nodes <-
  create_nodes(
    nodes = 1:9,
    type = c("fruit", "fruit", "fruit",
             "veg", "veg", "veg",
             "nut", "nut", "nut"),
    label = c("pineapple", "apple",
              "apricot", "cucumber",
              "celery", "endive",
              "hazelnut", "almond",
              "chestnut"))

edges <-
  create_edges(
    from = c(9, 3, 6, 2, 6, 2, 8, 2, 5, 5),
    to = c(1, 1, 4, 3, 7, 8, 1, 5, 3, 6))

graph <-
  create_graph(
    nodes_df = nodes,
    edges_df = edges,
    graph_attrs = "output = visNetwork")

# View the graph
render_graph(graph)



# View the internal NDF for sake of
# reference
get_node_df(graph)
#>   nodes  type     label
#> 1     1 fruit pineapple
#> 2     2 fruit     apple
#> 3     3 fruit   apricot
#> 4     4   veg  cucumber
#> 5     5   veg    celery
#> 6     6   veg    endive
#> 7     7   nut  hazelnut
#> 8     8   nut    almond
#> 9     9   nut  chestnut

# Select all nodes with a label beginning
# with `a` and traverse outward to all nodes
graph %>%
  select_nodes(
    node_attr = "label",
    search = "^a") %>%
  trav_out %>%
  get_selection
#> [1] "3" "8" "5" "1"

# This traversal results in a rather large
# selection of nodes: `3` (`apricot`), `8`
# (`almond`), `5` (`celery`), and `1`
# (`pineapple`)

# Now, select all nodes with a label beginning
# with `c` (in this case, the `cucumber` and
# `chestnut` and then traverse outward to any
# node of the `fruit` type
graph %>%
  select_nodes(
    node_attr = "label",
    search = "^c") %>%
  trav_out(
    node_attr = "type",
    match = "fruit") %>%
  get_selection
#> [1] "3" "1"

# The traversal has resulted in a selection of
# nodes `3` (`apricot`) and `1` (`pineapple`)

Traversals can also be constrained to those nodes satisfying logical statements based on numerical data. So long as the attribute provided for node_attr contains numerical data, the comparisons <, >, ==, and != can be used alongside a value for match argument. This type of traversal can have a great many use cases but here is a generic example using the trav_both() traversal function:

###
# Perform traversals with conditions
# based on node `type` values
###

library(DiagrammeR)
library(magrittr)

# Create a random graph (but set a seed!)
# of 5 nodes, and 10 edges; it'll create
# numerical data values for each node at
# no extra charge
random_graph <-
  create_random_graph(
    5, 10, TRUE, set_seed = 20) %>%
  set_global_graph_attrs(
    "graph", "output", "visNetwork")

# View the graph's internal NDF
random_graph %>% get_node_df
#>   nodes type label value
#> 1     1          1     9
#> 2     2          2     8
#> 3     3          3     3
#> 4     4          4   5.5
#> 5     5          5    10

# View the graph's internal EDF
random_graph %>% get_edge_df
#>    from to rel
#> 1     5  1
#> 2     1  3
#> 3     2  4
#> 4     4  1
#> 5     3  2
#> 6     5  2
#> 7     3  5
#> 8     3  4
#> 9     2  1
#> 10    5  4

# View a rendering of the graph and note
# that apparently larger nodes are
# indicative of larger values in nodes'
# `value` attribute
render_graph(random_graph)



# Now select node `3`, perform a traversal
# to other adjacent nodes, and then look
# at which nodes are traversed to
random_graph %>%
  select_nodes_by_id(3) %>%
  trav_both %>%
  get_selection
#> [1] "2" "5" "4" "1"

# Perform a similar traversal but, this
# time, only traverse to those nodes
# with `value` less than 8.5
random_graph %>%
  select_nodes_by_id(3) %>%
  trav_both("value", "<8.5") %>%
  get_selection
#> [1] "2" "4"

# That was a subset of the possible
# traversals with `trav_both()`; using
# a condition of greater than 8.5 will
# yield the other nodes
random_graph %>%
  select_nodes_by_id(3) %>%
  trav_both("value", ">8.5") %>%
  get_selection
#> [1] "5" "1"

# An exact match on a numeric value
# is possible through use of `==`
# before the value; in this case, use
# a value of 10
random_graph %>%
  select_nodes_by_id(3) %>%
  trav_both("value", "==10") %>%
  get_selection
#> [1] "5"

# For a traversal to all values except
# a specified value, use `!=` before
# such value
random_graph %>%
  select_nodes_by_id(3) %>%
  trav_both("value", "!=10") %>%
  get_selection
#> [1] "2" "4" "1"

Traversing from node to node with trav_out(), trav_in(), or trav_both() can result in very specific targeting of nodes. As seen, once the traversal has occurred, the new selection can be used to obtain data from those nodes, or, modify the graph (by adding new nodes to the selection). Especially with the use of a magrittr pipeline, the selection of nodes, the transversals, and the resulting actions become quite readable (as is the case with most R statements using magrittr).

Traversals from Nodes to Edges

Moving across nodes using traversal functions is quite a powerful thing to do. However, especially with information-rich graphs, some useful data can exist in the graph's edges. For this reason, we can traverse from nodes onto adjacent edges. As with the node-to-node traversal functions, the direction of the edge is important and a key distinction between the functions trav_out_edge() and trav_in_edge(). These types of traversals always begin at nodes (and thus require an initial selection of one or more nodes) and typically end with a selection of one or more edges. If no traversal can be made, then the initial selection of nodes is retained.

Starting with the trav_out_edge() function, suppose there is a selection of a single node 1 in the very simple graph of 1->2. Calling the trav_out_edge() function in its simplest form (without values supplied except for the graph itself) will result in an edge selection and that edge will be the 1->2 edge (which initiates at node 1 and terminates at node 2. Thus, the traversal is from one or more nodes onto adjacent, outward edges. On the same graph, with the same selection, calling the trav_in_edge() function will not result in a traversal (the initial node selection of node 1 will be retained, as though nothing happened). This is because the trav_in_edge() function performs the converse traversal, where the traversal is from one or more nodes onto adjacent, inward edges. Put another way, trav_in_edge() will change the selection to edges that point toward the intially-selected node(s), if any.

As with the node-to-node traversal functions, these traversals are much more powerful when used with matching conditions as they increase selectivity. That only certain edges may be traversed to (and selected) is important, especially in those cases where the traversal continues onto nodes (but more on that in the next section). Examples will aid in the understanding of these function.

###
# Perform two types of traversals from
# a single node using `trav_out_edge()`
# and `trav_in_edge()`
###

library(DiagrammeR)
library(magrittr)

# Create a simple graph with two nodes, an
# edge between them (`1` -> `2`); starting
# from node `1` (as a selection), traverse
# to the edge and then obtain the current
# selection
create_graph() %>%
  add_node %>%
  add_node %>%
  add_edge(1, 2) %>%
  select_nodes_by_id(1) %>%
  trav_out_edge %>%
  get_selection
#> [1] "1 -> 2"

# If no traversal can occur the selection is
# not altered. To demonstrate, use a similar
# pipeline but reverse the edge direction
create_graph() %>%
  add_node %>%
  add_node %>%
  add_edge(2, 1) %>%
  select_nodes_by_id(1) %>%
  trav_out_edge %>%
  get_selection
#> [1] "1"

# A traversal can occur if `trav_in_edge()`
# is used instead of `trav_out_edge()`
create_graph() %>%
  add_node %>%
  add_node %>%
  add_edge(2, 1) %>%
  select_nodes_by_id(1) %>%
  trav_in_edge %>%
  get_selection
#> [1] "2 -> 1"

# A selection of multiple edges can occur
# as a result of a traversal
create_graph() %>%
  add_node %>%
  select_nodes_by_id(1) %>%
  add_n_nodes_ws(10, "from") %>%
  add_n_nodes_ws(10, "to") %>%
  trav_out_edge %>%
  get_selection
#> [1] "1 -> 2"  "1 -> 3"  "1 -> 4"  "1 -> 5"
#> [5] "1 -> 6"  "1 -> 7"  "1 -> 8"  "1 -> 9"
#> [9] "1 -> 10" "1 -> 11"

create_graph() %>%
  add_node %>%
  select_nodes_by_id(1) %>%
  add_n_nodes_ws(10, "from") %>%
  add_n_nodes_ws(10, "to") %>%
  trav_in_edge %>%
  get_selection
#> [1] "12 -> 1" "13 -> 1" "14 -> 1" "15 -> 1"
#> [5] "16 -> 1" "17 -> 1" "18 -> 1" "19 -> 1"
#> [9] "20 -> 1" "21 -> 1"

To introduce conditions on the traversal, values can be supplied to the edge_attr and match arguments. As with the node-to-node traversal functions, these optional values induce filtering of the node-to-edge traversals. If a graph is fashioned as a property graph that has values set for node type edges rel attributes, traversals with trav_out_edge() and trav_in_edge() be restricted to selection of edges that have a specific rel label. This is done by setting edge_attr = rel and providing the value of that relationship for the match argument.

###
# Perform node-to-edge traversals
# from multiple nodes and with the
# use of matching conditions
###

library(DiagrammeR)
library(magrittr)

# First, set a seed so the example
# is reproducible

set.seed(20)

# Create a graph with fruit,
# vegetables, nuts, and... people!
nodes <-
  create_nodes(
    nodes = 1:14,
    type = c("person", "person",
             "person", "person",
             "person", "fruit",
             "fruit", "fruit",
             "veg", "veg", "veg",
             "nut", "nut", "nut"),
    label = c("Annie", "Donna",
              "Justine", "Ed",
              "Graham", "pineapple",
              "apple", "apricot",
              "cucumber", "celery",
              "endive", "hazelnut",
              "almond", "chestnut"))

edges <-
  create_edges(
    from = sort(
      as.vector(replicate(5, 1:5))),
    to = as.vector(
      replicate(5, sample(6:14, 5))),
    rel = as.vector(
      replicate(
        5, sample(
          c("likes", "dislikes",
            "allergic_to"), 5,
          TRUE,
          c(0.5, 0.25, 0.25)))))

graph <-
  create_graph(
    nodes_df = nodes,
    edges_df = edges,
    graph_attrs = "output = visNetwork")

# Behold the food preferences graph!
graph %>% render_graph



# Select all food-based nodes and
# determine the total number "likes",
# "dislikes", and "allergic_to"
# relationships
graph %>%
  select_nodes(node_attr = "type",
               search = "fruit") %>%
  select_nodes(node_attr = "type",
               search = "veg") %>%
  select_nodes(node_attr = "type",
               search = "nut") %>%
  trav_in_edge %>%
  cache_edge_count_ws %>%
  get_cache
#> [1] 25

# From all food-based nodes,
# determine which was liked the most
# (note the use of `invert_selection()`)
graph %>%
  select_nodes(node_attr = "type",
               search = "person") %>%
  invert_selection %>%
  trav_in_edge(edge_attr = "rel",
               match = "likes") %>%
  cache_edge_count_ws %>%
  get_cache
#> [1] 13

# So, there are 13 foods liked out of
# 25 total relationships. Determine
# which people have an allergy to
# the foods
graph %>%
  select_nodes(
    node_attr = "type", search = "person") %>%
  trav_out_edge(
    edge_attr = "rel", match = "allergic_to") %>%
  get_selection
#> [1] "1 -> 10" "2 -> 11" "3 -> 10" "4 -> 10"
#> [5] "5 -> 6"  "5 -> 9"  "5 -> 7"

# It ends up that all the people were
# allergic to at least one food (note
# that the `$edges$from` list component
# contains all person nodes from `1` to
# `5`). Food with node ID `10`
# (`chestnut`) is the cause of most
# allergic reactions

Now for a different example with numeric data within an edge attribute named data_value (this attribute name was arbitrarily chosen). Let's trying obtaining counts of edges that satisify some numerical comparisons.

###
# Perform more node-to-edge traversals
# from multiple nodes and with use
# of matching conditions
###

library(DiagrammeR)
library(magrittr)

# Create a random graph
# (10 nodes, 20 edges)
graph <-
  create_random_graph(
    10, 20,
    directed = TRUE,
    set_seed = 20) %>%
  set_global_graph_attrs(
    "graph", "output", "visNetwork")

# Set a seed for the various uses
# of the `sample()` function
set.seed(20)

# Use a `for` loop to randomly set
# various `type` values to nodes
for (i in 1:node_count(graph)) {
  graph %<>%
    set_node_attrs(
      nodes = i,
      node_attr = "type",
      values = sample(
        c("A", "B", "C"), 1))
}

# Use another `for` loop to randomly
# set various numerical values to
# the graph's edges
for (i in 1:edge_count(graph)) {
  graph %<>%
    set_edge_attrs(
      from = get_edges(., return_type = "df")[i, 1],
      to = get_edges(., return_type = "df")[i, 2],
      edge_attr = "data_value",
      values = sample(
        seq(0, 8, 0.5), 1))
}

# Look at the graph
graph %>% render_graph



# Select all the edges that are inbound
# edges to nodes, and, those edges have
# a `data_value` lesser than 4, then,
# determine the count of those edges
graph %>%
  select_nodes() %>%
  trav_in_edge(
    edge_attr = "data_value",
    match = "<4.0") %>%
  cache_edge_count_ws %>%
  get_cache
#> [1] 12

# Select all the edges that are outbound
# edges to nodes, and, those edges have
# a `data_value` greater than 4, then,
# get the values into a tabulation
graph %>%
  select_nodes() %>%
  trav_out_edge(
    edge_attr = "data_value",
    match = ">4.0") %>%
  cache_edge_attrs_ws("data_value") %>%
  get_cache %>%
  as.numeric %>%
  table
#> .
#> 2.5 5.5 6 6.5 7.5 8
#>   1   1 3   1   1 1

Traversals from Edges to Nodes

You've seen node-to-node traversals and you've seen node-to-edge traversals. Once you see edge-to-node traversals, you will have seen all the possible traversals. These types of traversals are the opposites to the node-to-edge traversals. The nomenclature may be confusing since the perspective of whether in or out is still in relation to the nodes. Thus, to move across the graph 1->2->3 from node 1, alighting to all edges and nodes before ultimately traversing to node 3, the traversal sequence is: trav_out_edge(), trav_in_node(), trav_out_edge(), and trav_in_node().

To introduce conditions on the traversal, values can be supplied to the node_attr and match arguments. These types of traversals, like the others, allow for the use of optional values for filtering the edge-to-node traversals. For property graphs, this is advantageous since the type node attribute (or other node attributes) can allow for specific edge-to-node traversals with trav_out_node() and trav_in_node(). The combination of a node-to-edge traversal with a subsequent edge-to-node traversal is particularly useful because we can traverse to adjacent nodes but limit the traversal by certain edge attributes (in a trav_out_edge() or trav_in_edge() call) and perhaps set another condition by node attribute values (in a trav_in_node() or trav_out_node() call). The following example is an extension of the food preferences graph, containing traversals both onto edges and off edges and back onto nodes.

###
# Perform node-to-edge traversals
# from multiple nodes and with the
# use of matching conditions
###

library(DiagrammeR)
library(magrittr)

# First, set a seed so the example
# is reproducible

set.seed(20)

# Create a graph with fruit,
# vegetables, nuts, and... people!
nodes <-
  create_nodes(
    nodes = 1:14,
    type = c("person", "person",
             "person", "person",
             "person", "fruit",
             "fruit", "fruit",
             "veg", "veg", "veg",
             "nut", "nut", "nut"),
    label = c("Annie", "Donna",
              "Justine", "Ed",
              "Graham", "pineapple",
              "apple", "apricot",
              "cucumber", "celery",
              "endive", "hazelnut",
              "almond", "chestnut"))

edges <-
  create_edges(
    from = sort(
      as.vector(replicate(5, 1:5))),
    to = as.vector(
      replicate(5, sample(6:14, 5))),
    rel = as.vector(
      replicate(
        5, sample(
          c("likes", "dislikes",
            "allergic_to"), 5,
          TRUE,
          c(0.5, 0.25, 0.25)))))

graph <-
  create_graph(
    nodes_df = nodes,
    edges_df = edges,
    graph_attrs = "output = visNetwork")

# Have a look at the graph
graph %>% render_graph



# Determine which food cause allergies
# and modify the appearance of those
# nodes (by adding a `color` attribute)
graph_allergies <-
  graph %>%
  select_nodes(
    node_attr = "type",
    search = "person") %>%
  invert_selection %>%
  trav_in_edge(
    edge_attr = "rel",
    match = "allergic_to") %>%
  trav_in_node %>%
  set_node_attrs_ws("color", "red") %>%
  invert_selection %>%
  set_node_attrs_ws("color", "green") %>%
  clear_selection %>%
  select_nodes(
    node_attr = "type", search = "person") %>%
  set_node_attrs_ws("color", "blue")

# Display the modified graph, where green
# nodes represent safe foods for the
# group of people (blue nodes); red nodes
# are the danger foods
graph_allergies %>% render_graph



# Get a vector of those foods that are
# deemed risky for this particular group
graph %>%
  select_nodes(
    node_attr = "type",
    search = "person") %>%
  invert_selection %>%
  trav_in_edge(
    edge_attr = "rel",
    match = "allergic_to") %>%
  trav_in_node %>%
  cache_node_attrs_ws("label") %>%
  get_cache
#> [1] "pineapple"  "apple"  "cucumber"
#> [4] "celery"  "endive"

We musn't forget that numeric values in node attributes can be exploited in a traversal.

###
# Perform both node-to-edge and
# edge-to-node traversals between
# multiple nodes and use conditions
# based on numeric comparisons
###

library(DiagrammeR)
library(magrittr)

# Create a random graph
# (10 nodes, 20 edges)
graph <-
  create_random_graph(
    10, 20,
    directed = TRUE,
    set_seed = 20) %>%
  set_global_graph_attrs(
    "graph", "output", "visNetwork")

# Set a seed for the various uses
# of the `sample()` function
set.seed(20)

# Use a `for` loop to randomly set
# various `type` values to nodes
for (i in 1:node_count(graph)) {
  graph %<>%
    set_node_attrs(
      nodes = i,
      node_attr = "type",
      values = sample(
        c("A", "B", "C"), 1))
}

# Use another `for` loop to randomly
# set various numerical values to
# the graph's edges
for (i in 1:edge_count(graph)){
  graph %<>%
    set_edge_attrs(
      from = get_edges(., return_type = "df")[i, 1],
      to = get_edges(., return_type = "df")[i, 2],
      edge_attr = "data_value",
      values = sample(
        seq(0, 8, 0.5), 1))
}

# Look at the graph
graph %>% render_graph

# Peform a traversal from nodes with a
# `value` <6, out to edges with a `data_value`
# <4, and onto nodes with a `value <2
graph %>%
  select_nodes("value", "<6") %>%
  trav_out_edge("data_value", "<4") %>%
  trav_in_node("value", "<2") %>%
  get_selection
#> [1] "7"

# From this traversal, only the node with
# ID of `7` is the selected node; so, what was
# the actual `value`?
graph %>%
  select_nodes("value", "<6") %>%
  trav_out_edge("data_value", "<4") %>%
  trav_in_node("value", "<2") %>%
  cache_node_attrs_ws("value") %>%
  get_cache %>%
  as.numeric
#> [1] 1

Software Repository Example

Here's an example that ties all of these traversal types together. It involves a fictional software repository with software contributors and the projects as entities in the graph. The graph is a property graph because the the nodes and edges are labeled with type and rel attributes, respectively.

You can use traversals to get specific selections and then perform data inspection or graph modification. We can find out bits of information without manually inspecting the underlying NDFs or EDFs. This will be important when a property graph becomes quite large since manually inspecting those graph components will be difficult and impractical. Moreover, the types of traversals can be quite complex, with multiple dependencies. In a typical relational database, these queries are possible (but lengthy) with multiple inner joins. After a certain point of complexity such queries may introduce latency considered to be unreasonable.

The example graph to be used is a fake dataset with contributors to software projects on a platform not quite unlike GitHub. The DiagrammeR package contains the CSV files required to build the graph.

###
# Use all manner of traversals with
# other functions to get information
# and to modify a property graph
###

library(DiagrammeR)
library(magrittr)

# Create a path to the CSV file containing
# contributors to software projects
contributors_csv <-
  system.file("examples/contributors.csv",
              package = "DiagrammeR")

colnames(read.csv(contributors_csv,
                  stringsAsFactors = FALSE))
#> [1] "name"  "age"  "join_date"  "email"
#> [5] "follower_count"  "following_count"
#> [7] "starred_count"

# Create a path to the CSV file containing
# information about the software projects
projects_csv <-
  system.file("examples/projects.csv",
              package = "DiagrammeR")

colnames(read.csv(projects_csv,
                  stringsAsFactors = FALSE))
#> [1] "project"  "start_date"  "stars"
#> [4] "language"

# Create a path to the CSV file with information
# about the relationships between the projects
# and their contributors
projects_and_contributors_csv <-
  system.file("examples/projects_and_contributors.csv",
              package = "DiagrammeR")

colnames(read.csv(projects_and_contributors_csv,
                  stringsAsFactors = FALSE))
#> [1] "project_name"  "contributor_name"
#> [3] "contributor_role"  "commits"

# Create the property graph by adding the CSV data to a
# new graph; the `add_nodes_from_csv()` and
# `add_edges_from_csv()` functions are used to create
# nodes and edges in the graph
graph <-
  create_graph() %>%
  set_graph_name("software_projects") %>%
  set_global_graph_attrs(
    "graph", "output", "visNetwork") %>%
  add_nodes_from_table(
    contributors_csv,
    set_type = "person",
    label_col = "name") %>%
  add_nodes_from_table(
    projects_csv,
    set_type = "project",
    label_col = "project") %>%
  add_edges_from_table(
    projects_and_contributors_csv,
    from_col = "contributor_name",
    from_mapping = "name",
    to_col = "project_name",
    to_mapping = "project",
    rel_col = "contributor_role")

# View the graph
graph %>% render_graph



# Get the average age of all the contributors
graph %>%
  select_nodes("type", "person") %>%
  cache_node_attrs_ws("age", "numeric") %>%
  get_cache %>%
  mean
#> [1] 33.6

# Get the total number of commits to all software
# projects
graph %>%
  select_edges %>%
  cache_edge_attrs_ws("commits", "numeric") %>%
  get_cache %>%
  sum
#> [1] 5182

# Get total number of commits from Josh as a maintainer
# and a contributor
graph %>%
  select_nodes("name", "Josh") %>%
  trav_out_edge %>%
  cache_edge_attrs_ws("commits", "numeric") %>%
  get_cache %>%
  sum
#> [1] 227

# Get total number of commits from Louisa
graph %>%
  select_nodes("name", "Louisa") %>%
  trav_out_edge %>%
  cache_edge_attrs_ws("commits", "numeric") %>%
  get_cache %>%
  sum
#> [1] 615

# As a bit of an aside, we can use selections and
# rescale values to a styling attribute such as
# edge width, node size, or color. Select all
# edges and apply an edge `width` attribute scaled
# by the edge attribute `commits` to a range of
# 0.5 to 3.0
graph_scale_width_edges <-
  graph %>%
  select_edges %>%
  rescale_edge_attrs_ws(
    "commits", "width", 0.5, 3.0)

# Inspect the graph's internal EDF
get_edge_df(graph_scale_width_edges)
#>   from to         rel commits width
#> 1    2 11  maintainer     236  0.75
#> 2    1 11 contributor     121 0.627
#> 3    3 11 contributor      32 0.532
#> 4    2 12 contributor      92 0.596
#> 5    4 12 contributor     124  0.63
#> 6    5 12  maintainer    1460 2.059
#> 7    4 13  maintainer     103 0.608
#> 8    6 13 contributor     236  0.75
#> 9    7 13 contributor     126 0.633
#> 10   8 13 contributor    2340     3
#> 11   9 13 contributor       2   0.5
#> 12  10 13 contributor      23 0.522
#> 13   2 13 contributor     287 0.805

# View the graph, larger edges and arrows
# indicate higher numbers of `commits`
graph_scale_width_edges %>% render_graph



# Select all edges and apply a color attribute based
# on another edge attribute
graph_scale_color_edges <-
  graph %>%
  select_edges %>%
  rescale_edge_attrs_ws(
    "commits", "color", "gray95", "gray5")

# Render the graph, darker edges represent higher
# commits
graph_scale_color_edges %>% render_graph



# Get the names of people in graph above age 32
graph %>%
  select_nodes("type", "person") %>%
  select_nodes("age", ">32", "intersect") %>%
  cache_node_attrs_ws("name") %>%
  get_cache %>%
  sort
#> [1] "Jack"   "Jon"    "Kim"    "Roger"  "Sheryl"

# Get the total number of commits from all people to
# the `supercalc` project
graph %>%
  select_nodes("project", "supercalc") %>%
  trav_in_edge %>%
  cache_edge_attrs_ws("commits", "numeric") %>%
  withdraw_values %>%
  sum
#> [1] 1676

# Who committed the most to the `supercalc` project?
graph %>%
  select_nodes("project", "supercalc") %>%
  trav_in_edge %>%
  cache_edge_attrs_ws("commits", "numeric") %>%
  trav_in_node %>%
  trav_in_edge("commits", max(get_cache(.))) %>%
  trav_out_node %>%
  cache_node_attrs_ws("name") %>%
  get_cache
#> [1] "Sheryl"

# What is the email address of the individual that
# contributed the least to the `randomizer` project?
graph %>%
  select_nodes("project", "randomizer") %>%
  trav_in_edge %>%
  cache_edge_attrs_ws("commits", "numeric") %>%
  trav_in_node %>%
  trav_in_edge("commits", min(get_cache(.))) %>%
  trav_out_node %>%
  cache_node_attrs_ws("email") %>%
  get_cache
#> [1] "the_will@graphymail.com"

# Update the graph, because, it has come to our
# attention that Kim is now a contributor to
# `stringbuildeR` and has made 15 new commits to
# that project
graph %<>%
  add_edge(
    get_nodes(.,
      "name", "Kim"),
    get_nodes(.,
      "project", "stringbuildeR"),
    "contributor") %>%
  select_last_edge %>%
  set_edge_attrs_ws("commits", 15) %>%
  clear_selection

# View the graph's internal EDF, the newest
# edge is at the bottom
get_edge_df(graph)
#>    from to         rel commits
#> 1     2 11  maintainer     236
#> 2     1 11 contributor     121
#> 3     3 11 contributor      32
#> 4     2 12 contributor      92
#> 5     4 12 contributor     124
#> 6     5 12  maintainer    1460
#> 7     4 13  maintainer     103
#> 8     6 13 contributor     236
#> 9     7 13 contributor     126
#> 10    8 13 contributor    2340
#> 11    9 13 contributor       2
#> 12   10 13 contributor      23
#> 13    2 13 contributor     287
#> 14    8 11 contributor      15

# View the graph to see the new edge
graph %>% render_graph



# Get all email addresses to contributors (but not
# maintainers) of the `randomizer` and `supercalc`
# projects
graph %>%
  select_nodes("project", "randomizer") %>%
  select_nodes("project", "supercalc") %>%
  trav_in_edge("rel", "contributor") %>%
  trav_out_node %>%
  cache_node_attrs_ws("email", "character") %>%
  get_cache
#> [1] "lhe99@mailing-fun.com"  "josh_ch@megamail.kn"
#> [3] "roger_that@whalemail.net"  "the_simone@a-q-w-o.net"
#> [5] "kim_3251323@ohhh.ai"  "the_will@graphymail.com"
#> [7] "j_2000@ultramail.io"

# Which committer to the `randomizer` project has the
# highest number of followers?
graph %>%
  select_nodes("project", "randomizer") %>%
  trav_in %>%
  cache_node_attrs_ws("follower_count", "numeric") %>%
  select_nodes("project", "randomizer") %>%
  trav_in("follower_count", max(get_cache(.))) %>%
  cache_node_attrs_ws("name") %>%
  get_cache
#> [1] "Kim"

# Which people have committed to more than one
# project?
graph %>%
  select_nodes_by_degree("out", ">1") %>%
  cache_node_attrs_ws("name") %>%
  get_cache %>%
  sort
#> [1] "Josh"  "Kim"  "Louisa"