Traversals

Imagine diving into a graph and moving across the graph's nodes, jumping onto an edge, perhaps bypassing those edges and simply alighting to different nodes with specific attributes. Traversals are quite important as part of a graph query. You can develop sophisticated pipelines that allow for selective movement across the graph (based on conditions you specify per traversal) and the gleaning of information from nodes and edges. Importantly, traversals begin with selections of nodes or edges and the act of traversing modifies the selection of nodes or edges. One may select a single node, for instance, perform one or more traversals away from that initial node, and perhaps create a selection of several different nodes (or even edges). There are many important use cases, so, an in-depth primer of DiagrammeR's traversal functions is provided alongside numerous practical examples.

Traversals Across Nodes

To traverse across connected nodes without regard to the properties of the edges between the nodes, three functions are available: `trav_out()`, `trav_in()`, and `trav_both()`. These types of traversals always require an initial selection of one or more nodes, and, after traversing, a selection of one or more nodes is returned.

Directionality of the traversal is the key differentiator between these three functions. The `trav_out()` function allows for traversals to connected nodes that are outbound nodes in relation to the origin nodes (in a directed graph). With the `trav_in()` function, the movement is reversed: traversals towards connected nodes are to inbound nodes. For example, take the edge described by `1->2` and the origin node is the node with ID `1`; the `trav_out()` function would change the node selection from node `1` to node `2` because these nodes are adjacent to each other and the edge leads from the origin node to an outbound node. If node `1` has outbound edges to other nodes (e.g., `1->{2,3,4}`) then all of those nodes connected to outbound edges of the origin node will be part of the new selection. Take another example with a central node as the selected node, and that node has both outbound and inbound edges to adjacent nodes: `{2,3,4}->1->{5,6,7}`. Should the function `trav_in()` be used, then nodes `2`, `3`, and `4` will become the selected nodes; using `trav_out()` will result in nodes `5`, `6`, and `7` becoming the selected nodes. Here are several examples of traversals across nodes.

``````###
# Perform two types of traversals from
# a single node using `trav_out()` and
# `trav_in()`
###

library(DiagrammeR)
library(magrittr)

# Create a simple graph with two nodes, an edge
# between them (`1` -> `2`); starting from node
# `1` (as a selection), traverse to node `2`
# and then obtain the current selection
create_graph() %>%
select_nodes_by_id(1) %>%
trav_out %>%
get_selection
#> [1] "2"

# If no traversal can occur, the selection is not
# altered. To demonstrate, use a similar pipeline
# but reverse the edge direction
create_graph() %>%
select_nodes_by_id(1) %>%
trav_out %>%
get_selection
#> [1] "1"

# A traversal can occur if `trav_in()` is used
create_graph() %>%
select_nodes_by_id(1) %>%
trav_in %>%
get_selection
#> [1] "2"

# Multiple traversals can be made in a single
# magrittr pipeline
create_graph() %>%
select_nodes_by_id(1) %>%
trav_out %>%
trav_out %>%
trav_out %>%
trav_out %>%
get_selection
#> [1] "5"

# A selection of multiple nodes can occur as
# a result of a traversal
create_graph() %>%
select_nodes_by_id(1) %>%
trav_out %>%
get_selection
#>  [1] "2"  "3"  "4"  "5"  "6"  "7"  "8"
#>  [8] "9"  "10" "11"

create_graph() %>%
select_nodes_by_id(1) %>%
trav_in %>%
get_selection
#>  [1] "12" "13" "14" "15" "16" "17"
#>  [7] "18" "19" "20" "21"
``````

The `trav_both()` function results in traversals to adjacent nodes regardless of the edge directions between those nodes. So, in a sense, the direction of movement to adjacent nodes is both in and out, or, both. For the example of `{2,3,4}->1->{5,6,7}`, where node `1` is the only node in the selection, all of nodes `2` through to node `6` will be part of the new selection after calling `trav_both()`.

``````###
# Perform traversals from a single
# node using `trav_both()`
###

library(DiagrammeR)
library(magrittr)

# Create the graph described in the paragraph
# above ({`2...4`} -> `1` -> {`5...7`}),
# start from node `1` (as a selection),
# traverse to all other adjacent nodes and
# then obtain the current selection
create_graph() %>%
select_nodes_by_id(1) %>%
trav_both %>%
get_selection
#> [1] "5" "6" "7" "2" "3" "4"
``````

So far, these functions are described as modifying selections of nodes based solely on node adjacency and the direction of the edges between the adjacent nodes. Indeed without supplying values to the function, traversals occur without regard to the attributes of the nodes traversed to. However, the arguments `node_attr` and `match` are available for filtering the traversals to those that satisfy logical statements on numeric attributes or matches on character attributes. For a property graph, where values are available for all nodes' `type` attribute and all edges' `rel` attribute, a traversal with `trav_out()` could, for example, be performed for all outbound, adjacent nodes that have a specific `type` label. This is done by setting `node_attr = type` and providing the value of that `type` for the `match` argument.

``````###
# Perform traversals with conditions
# based on node `type` values
###

library(DiagrammeR)
library(magrittr)

# Create a common graph with nodes having
# various `type` values; set to render
# always using `visNetwork` when calling
# `render_graph()`
graph <-
create_graph() %>%
set_global_graph_attrs(
"graph", "output", "visNetwork") %>%

# View the created graph
render_graph(graph)

graph %>%
select_nodes_by_id(1) %>%
trav_out %>%
get_selection

graph %>%
select_nodes_by_id(1) %>%
trav_out("type", "type_b") %>%
get_selection

graph %>% select_nodes_by_id(1) %>%
trav_out("type", "type_c") %>%
get_selection

# Once the nodes have been selected via
# a traversal, a useful thing to do would
# be to attach new nodes to that selection
updated_graph <-
graph %>%
select_nodes_by_id(1) %>%
trav_out("type", "type_c") %>%

# View the updated graph
render_graph(updated_graph)

``````

We are not limited to starting a traversal from a single node ID value, we can begin from a selection of nodes based on a regular expression and traverse to a matching `type` string value (or to other node attributes that have `character` values). The following example uses a random graph of food entities with arbitrary edges between them.

``````###
# Perform traversals with conditions
# based on node `label` regex matches
###

library(DiagrammeR)
library(magrittr)

# Create a graph with fruit, vegetables,
# and nuts
nodes <-
create_nodes(
nodes = 1:9,
type = c("fruit", "fruit", "fruit",
"veg", "veg", "veg",
"nut", "nut", "nut"),
label = c("pineapple", "apple",
"apricot", "cucumber",
"celery", "endive",
"hazelnut", "almond",
"chestnut"))

edges <-
create_edges(
from = c(9, 3, 6, 2, 6, 2, 8, 2, 5, 5),
to = c(1, 1, 4, 3, 7, 8, 1, 5, 3, 6))

graph <-
create_graph(
nodes_df = nodes,
edges_df = edges,
graph_attrs = "output = visNetwork")

# View the graph
render_graph(graph)

# View the internal NDF for sake of
# reference
get_node_df(graph)
#>   nodes  type     label
#> 1     1 fruit pineapple
#> 2     2 fruit     apple
#> 3     3 fruit   apricot
#> 4     4   veg  cucumber
#> 5     5   veg    celery
#> 6     6   veg    endive
#> 7     7   nut  hazelnut
#> 8     8   nut    almond
#> 9     9   nut  chestnut

# Select all nodes with a label beginning
# with `a` and traverse outward to all nodes
graph %>%
select_nodes(
node_attr = "label",
search = "^a") %>%
trav_out %>%
get_selection
#> [1] "3" "8" "5" "1"

# This traversal results in a rather large
# selection of nodes: `3` (`apricot`), `8`
# (`almond`), `5` (`celery`), and `1`
# (`pineapple`)

# Now, select all nodes with a label beginning
# with `c` (in this case, the `cucumber` and
# `chestnut` and then traverse outward to any
# node of the `fruit` type
graph %>%
select_nodes(
node_attr = "label",
search = "^c") %>%
trav_out(
node_attr = "type",
match = "fruit") %>%
get_selection
#> [1] "3" "1"

# The traversal has resulted in a selection of
# nodes `3` (`apricot`) and `1` (`pineapple`)
``````

Traversals can also be constrained to those nodes satisfying logical statements based on numerical data. So long as the attribute provided for `node_attr` contains numerical data, the comparisons `<`, `>`, `==`, and `!=` can be used alongside a value for `match` argument. This type of traversal can have a great many use cases but here is a generic example using the `trav_both()` traversal function:

``````###
# Perform traversals with conditions
# based on node `type` values
###

library(DiagrammeR)
library(magrittr)

# Create a random graph (but set a seed!)
# of 5 nodes, and 10 edges; it'll create
# numerical data values for each node at
# no extra charge
random_graph <-
create_random_graph(
5, 10, TRUE, set_seed = 20) %>%
set_global_graph_attrs(
"graph", "output", "visNetwork")

# View the graph's internal NDF
random_graph %>% get_node_df
#>   nodes type label value
#> 1     1          1     9
#> 2     2          2     8
#> 3     3          3     3
#> 4     4          4   5.5
#> 5     5          5    10

# View the graph's internal EDF
random_graph %>% get_edge_df
#>    from to rel
#> 1     5  1
#> 2     1  3
#> 3     2  4
#> 4     4  1
#> 5     3  2
#> 6     5  2
#> 7     3  5
#> 8     3  4
#> 9     2  1
#> 10    5  4

# View a rendering of the graph and note
# that apparently larger nodes are
# indicative of larger values in nodes'
# `value` attribute
render_graph(random_graph)

# Now select node `3`, perform a traversal
# to other adjacent nodes, and then look
# at which nodes are traversed to
random_graph %>%
select_nodes_by_id(3) %>%
trav_both %>%
get_selection
#> [1] "2" "5" "4" "1"

# Perform a similar traversal but, this
# time, only traverse to those nodes
# with `value` less than 8.5
random_graph %>%
select_nodes_by_id(3) %>%
trav_both("value", "<8.5") %>%
get_selection
#> [1] "2" "4"

# That was a subset of the possible
# traversals with `trav_both()`; using
# a condition of greater than 8.5 will
# yield the other nodes
random_graph %>%
select_nodes_by_id(3) %>%
trav_both("value", ">8.5") %>%
get_selection
#> [1] "5" "1"

# An exact match on a numeric value
# is possible through use of `==`
# before the value; in this case, use
# a value of 10
random_graph %>%
select_nodes_by_id(3) %>%
trav_both("value", "==10") %>%
get_selection
#> [1] "5"

# For a traversal to all values except
# a specified value, use `!=` before
# such value
random_graph %>%
select_nodes_by_id(3) %>%
trav_both("value", "!=10") %>%
get_selection
#> [1] "2" "4" "1"
``````

Traversing from node to node with `trav_out()`, `trav_in()`, or `trav_both()` can result in very specific targeting of nodes. As seen, once the traversal has occurred, the new selection can be used to obtain data from those nodes, or, modify the graph (by adding new nodes to the selection). Especially with the use of a magrittr pipeline, the selection of nodes, the transversals, and the resulting actions become quite readable (as is the case with most R statements using magrittr).

Traversals from Nodes to Edges

Moving across nodes using traversal functions is quite a powerful thing to do. However, especially with information-rich graphs, some useful data can exist in the graph's edges. For this reason, we can traverse from nodes onto adjacent edges. As with the node-to-node traversal functions, the direction of the edge is important and a key distinction between the functions `trav_out_edge()` and `trav_in_edge()`. These types of traversals always begin at nodes (and thus require an initial selection of one or more nodes) and typically end with a selection of one or more edges. If no traversal can be made, then the initial selection of nodes is retained.

Starting with the `trav_out_edge()` function, suppose there is a selection of a single node `1` in the very simple graph of `1->2`. Calling the `trav_out_edge()` function in its simplest form (without values supplied except for the graph itself) will result in an edge selection and that edge will be the `1->2` edge (which initiates at node `1` and terminates at node `2`. Thus, the traversal is from one or more nodes onto adjacent, outward edges. On the same graph, with the same selection, calling the `trav_in_edge()` function will not result in a traversal (the initial node selection of node `1` will be retained, as though nothing happened). This is because the `trav_in_edge()` function performs the converse traversal, where the traversal is from one or more nodes onto adjacent, inward edges. Put another way, `trav_in_edge()` will change the selection to edges that point toward the intially-selected node(s), if any.

As with the node-to-node traversal functions, these traversals are much more powerful when used with matching conditions as they increase selectivity. That only certain edges may be traversed to (and selected) is important, especially in those cases where the traversal continues onto nodes (but more on that in the next section). Examples will aid in the understanding of these function.

``````###
# Perform two types of traversals from
# a single node using `trav_out_edge()`
# and `trav_in_edge()`
###

library(DiagrammeR)
library(magrittr)

# Create a simple graph with two nodes, an
# edge between them (`1` -> `2`); starting
# from node `1` (as a selection), traverse
# to the edge and then obtain the current
# selection
create_graph() %>%
select_nodes_by_id(1) %>%
trav_out_edge %>%
get_selection
#> [1] "1 -> 2"

# If no traversal can occur the selection is
# not altered. To demonstrate, use a similar
# pipeline but reverse the edge direction
create_graph() %>%
select_nodes_by_id(1) %>%
trav_out_edge %>%
get_selection
#> [1] "1"

# A traversal can occur if `trav_in_edge()`
# is used instead of `trav_out_edge()`
create_graph() %>%
select_nodes_by_id(1) %>%
trav_in_edge %>%
get_selection
#> [1] "2 -> 1"

# A selection of multiple edges can occur
# as a result of a traversal
create_graph() %>%
select_nodes_by_id(1) %>%
trav_out_edge %>%
get_selection
#> [1] "1 -> 2"  "1 -> 3"  "1 -> 4"  "1 -> 5"
#> [5] "1 -> 6"  "1 -> 7"  "1 -> 8"  "1 -> 9"
#> [9] "1 -> 10" "1 -> 11"

create_graph() %>%
select_nodes_by_id(1) %>%
trav_in_edge %>%
get_selection
#> [1] "12 -> 1" "13 -> 1" "14 -> 1" "15 -> 1"
#> [5] "16 -> 1" "17 -> 1" "18 -> 1" "19 -> 1"
#> [9] "20 -> 1" "21 -> 1"``````

To introduce conditions on the traversal, values can be supplied to the `edge_attr` and `match` arguments. As with the node-to-node traversal functions, these optional values induce filtering of the node-to-edge traversals. If a graph is fashioned as a property graph that has values set for node `type` edges `rel` attributes, traversals with `trav_out_edge()` and `trav_in_edge()` be restricted to selection of edges that have a specific `rel` label. This is done by setting `edge_attr = rel` and providing the value of that relationship for the `match` argument.

``````###
# Perform node-to-edge traversals
# from multiple nodes and with the
# use of matching conditions
###

library(DiagrammeR)
library(magrittr)

# First, set a seed so the example
# is reproducible

set.seed(20)

# Create a graph with fruit,
# vegetables, nuts, and... people!
nodes <-
create_nodes(
nodes = 1:14,
type = c("person", "person",
"person", "person",
"person", "fruit",
"fruit", "fruit",
"veg", "veg", "veg",
"nut", "nut", "nut"),
label = c("Annie", "Donna",
"Justine", "Ed",
"Graham", "pineapple",
"apple", "apricot",
"cucumber", "celery",
"endive", "hazelnut",
"almond", "chestnut"))

edges <-
create_edges(
from = sort(
as.vector(replicate(5, 1:5))),
to = as.vector(
replicate(5, sample(6:14, 5))),
rel = as.vector(
replicate(
5, sample(
c("likes", "dislikes",
"allergic_to"), 5,
TRUE,
c(0.5, 0.25, 0.25)))))

graph <-
create_graph(
nodes_df = nodes,
edges_df = edges,
graph_attrs = "output = visNetwork")

# Behold the food preferences graph!
graph %>% render_graph

# Select all food-based nodes and
# determine the total number "likes",
# "dislikes", and "allergic_to"
# relationships
graph %>%
select_nodes(node_attr = "type",
search = "fruit") %>%
select_nodes(node_attr = "type",
search = "veg") %>%
select_nodes(node_attr = "type",
search = "nut") %>%
trav_in_edge %>%
cache_edge_count_ws %>%
get_cache
#> [1] 25

# From all food-based nodes,
# determine which was liked the most
# (note the use of `invert_selection()`)
graph %>%
select_nodes(node_attr = "type",
search = "person") %>%
invert_selection %>%
trav_in_edge(edge_attr = "rel",
match = "likes") %>%
cache_edge_count_ws %>%
get_cache
#> [1] 13

# So, there are 13 foods liked out of
# 25 total relationships. Determine
# which people have an allergy to
# the foods
graph %>%
select_nodes(
node_attr = "type", search = "person") %>%
trav_out_edge(
edge_attr = "rel", match = "allergic_to") %>%
get_selection
#> [1] "1 -> 10" "2 -> 11" "3 -> 10" "4 -> 10"
#> [5] "5 -> 6"  "5 -> 9"  "5 -> 7"

# It ends up that all the people were
# allergic to at least one food (note
# that the `\$edges\$from` list component
# contains all person nodes from `1` to
# `5`). Food with node ID `10`
# (`chestnut`) is the cause of most
# allergic reactions``````

Now for a different example with numeric data within an edge attribute named `data_value` (this attribute name was arbitrarily chosen). Let's trying obtaining counts of edges that satisify some numerical comparisons.

``````###
# Perform more node-to-edge traversals
# from multiple nodes and with use
# of matching conditions
###

library(DiagrammeR)
library(magrittr)

# Create a random graph
# (10 nodes, 20 edges)
graph <-
create_random_graph(
10, 20,
directed = TRUE,
set_seed = 20) %>%
set_global_graph_attrs(
"graph", "output", "visNetwork")

# Set a seed for the various uses
# of the `sample()` function
set.seed(20)

# Use a `for` loop to randomly set
# various `type` values to nodes
for (i in 1:node_count(graph)) {
graph %<>%
set_node_attrs(
nodes = i,
node_attr = "type",
values = sample(
c("A", "B", "C"), 1))
}

# Use another `for` loop to randomly
# set various numerical values to
# the graph's edges
for (i in 1:edge_count(graph)) {
graph %<>%
set_edge_attrs(
from = get_edges(., return_type = "df")[i, 1],
to = get_edges(., return_type = "df")[i, 2],
edge_attr = "data_value",
values = sample(
seq(0, 8, 0.5), 1))
}

# Look at the graph
graph %>% render_graph

# Select all the edges that are inbound
# edges to nodes, and, those edges have
# a `data_value` lesser than 4, then,
# determine the count of those edges
graph %>%
select_nodes() %>%
trav_in_edge(
edge_attr = "data_value",
match = "<4.0") %>%
cache_edge_count_ws %>%
get_cache
#> [1] 12

# Select all the edges that are outbound
# edges to nodes, and, those edges have
# a `data_value` greater than 4, then,
# get the values into a tabulation
graph %>%
select_nodes() %>%
trav_out_edge(
edge_attr = "data_value",
match = ">4.0") %>%
cache_edge_attrs_ws("data_value") %>%
get_cache %>%
as.numeric %>%
table
#> .
#> 2.5 5.5 6 6.5 7.5 8
#>   1   1 3   1   1 1``````

Traversals from Edges to Nodes

You've seen node-to-node traversals and you've seen node-to-edge traversals. Once you see edge-to-node traversals, you will have seen all the possible traversals. These types of traversals are the opposites to the node-to-edge traversals. The nomenclature may be confusing since the perspective of whether in or out is still in relation to the nodes. Thus, to move across the graph `1->2->3` from node `1`, alighting to all edges and nodes before ultimately traversing to node `3`, the traversal sequence is: `trav_out_edge()`, `trav_in_node()`, `trav_out_edge()`, and `trav_in_node()`.

To introduce conditions on the traversal, values can be supplied to the `node_attr` and `match` arguments. These types of traversals, like the others, allow for the use of optional values for filtering the edge-to-node traversals. For property graphs, this is advantageous since the `type` node attribute (or other node attributes) can allow for specific edge-to-node traversals with `trav_out_node()` and `trav_in_node()`. The combination of a node-to-edge traversal with a subsequent edge-to-node traversal is particularly useful because we can traverse to adjacent nodes but limit the traversal by certain edge attributes (in a `trav_out_edge()` or `trav_in_edge()` call) and perhaps set another condition by node attribute values (in a `trav_in_node()` or `trav_out_node()` call). The following example is an extension of the food preferences graph, containing traversals both onto edges and off edges and back onto nodes.

``````###
# Perform node-to-edge traversals
# from multiple nodes and with the
# use of matching conditions
###

library(DiagrammeR)
library(magrittr)

# First, set a seed so the example
# is reproducible

set.seed(20)

# Create a graph with fruit,
# vegetables, nuts, and... people!
nodes <-
create_nodes(
nodes = 1:14,
type = c("person", "person",
"person", "person",
"person", "fruit",
"fruit", "fruit",
"veg", "veg", "veg",
"nut", "nut", "nut"),
label = c("Annie", "Donna",
"Justine", "Ed",
"Graham", "pineapple",
"apple", "apricot",
"cucumber", "celery",
"endive", "hazelnut",
"almond", "chestnut"))

edges <-
create_edges(
from = sort(
as.vector(replicate(5, 1:5))),
to = as.vector(
replicate(5, sample(6:14, 5))),
rel = as.vector(
replicate(
5, sample(
c("likes", "dislikes",
"allergic_to"), 5,
TRUE,
c(0.5, 0.25, 0.25)))))

graph <-
create_graph(
nodes_df = nodes,
edges_df = edges,
graph_attrs = "output = visNetwork")

# Have a look at the graph
graph %>% render_graph

# Determine which food cause allergies
# and modify the appearance of those
# nodes (by adding a `color` attribute)
graph_allergies <-
graph %>%
select_nodes(
node_attr = "type",
search = "person") %>%
invert_selection %>%
trav_in_edge(
edge_attr = "rel",
match = "allergic_to") %>%
trav_in_node %>%
set_node_attrs_ws("color", "red") %>%
invert_selection %>%
set_node_attrs_ws("color", "green") %>%
clear_selection %>%
select_nodes(
node_attr = "type", search = "person") %>%
set_node_attrs_ws("color", "blue")

# Display the modified graph, where green
# nodes represent safe foods for the
# group of people (blue nodes); red nodes
# are the danger foods
graph_allergies %>% render_graph

# Get a vector of those foods that are
# deemed risky for this particular group
graph %>%
select_nodes(
node_attr = "type",
search = "person") %>%
invert_selection %>%
trav_in_edge(
edge_attr = "rel",
match = "allergic_to") %>%
trav_in_node %>%
cache_node_attrs_ws("label") %>%
get_cache
#> [1] "pineapple"  "apple"  "cucumber"
#> [4] "celery"  "endive"``````

We musn't forget that numeric values in node attributes can be exploited in a traversal.

``````###
# Perform both node-to-edge and
# edge-to-node traversals between
# multiple nodes and use conditions
# based on numeric comparisons
###

library(DiagrammeR)
library(magrittr)

# Create a random graph
# (10 nodes, 20 edges)
graph <-
create_random_graph(
10, 20,
directed = TRUE,
set_seed = 20) %>%
set_global_graph_attrs(
"graph", "output", "visNetwork")

# Set a seed for the various uses
# of the `sample()` function
set.seed(20)

# Use a `for` loop to randomly set
# various `type` values to nodes
for (i in 1:node_count(graph)) {
graph %<>%
set_node_attrs(
nodes = i,
node_attr = "type",
values = sample(
c("A", "B", "C"), 1))
}

# Use another `for` loop to randomly
# set various numerical values to
# the graph's edges
for (i in 1:edge_count(graph)){
graph %<>%
set_edge_attrs(
from = get_edges(., return_type = "df")[i, 1],
to = get_edges(., return_type = "df")[i, 2],
edge_attr = "data_value",
values = sample(
seq(0, 8, 0.5), 1))
}

# Look at the graph
graph %>% render_graph

# Peform a traversal from nodes with a
# `value` <6, out to edges with a `data_value`
# <4, and onto nodes with a `value <2
graph %>%
select_nodes("value", "<6") %>%
trav_out_edge("data_value", "<4") %>%
trav_in_node("value", "<2") %>%
get_selection
#> [1] "7"

# From this traversal, only the node with
# ID of `7` is the selected node; so, what was
# the actual `value`?
graph %>%
select_nodes("value", "<6") %>%
trav_out_edge("data_value", "<4") %>%
trav_in_node("value", "<2") %>%
cache_node_attrs_ws("value") %>%
get_cache %>%
as.numeric
#> [1] 1``````

Software Repository Example

Here's an example that ties all of these traversal types together. It involves a fictional software repository with software contributors and the projects as entities in the graph. The graph is a property graph because the the nodes and edges are labeled with `type` and `rel` attributes, respectively.

You can use traversals to get specific selections and then perform data inspection or graph modification. We can find out bits of information without manually inspecting the underlying NDFs or EDFs. This will be important when a property graph becomes quite large since manually inspecting those graph components will be difficult and impractical. Moreover, the types of traversals can be quite complex, with multiple dependencies. In a typical relational database, these queries are possible (but lengthy) with multiple inner joins. After a certain point of complexity such queries may introduce latency considered to be unreasonable.

The example graph to be used is a fake dataset with contributors to software projects on a platform not quite unlike GitHub. The DiagrammeR package contains the CSV files required to build the graph.

``````###
# Use all manner of traversals with
# other functions to get information
# and to modify a property graph
###

library(DiagrammeR)
library(magrittr)

# Create a path to the CSV file containing
# contributors to software projects
contributors_csv <-
system.file("examples/contributors.csv",
package = "DiagrammeR")

stringsAsFactors = FALSE))
#> [1] "name"  "age"  "join_date"  "email"
#> [5] "follower_count"  "following_count"
#> [7] "starred_count"

# Create a path to the CSV file containing
# information about the software projects
projects_csv <-
system.file("examples/projects.csv",
package = "DiagrammeR")

stringsAsFactors = FALSE))
#> [1] "project"  "start_date"  "stars"
#> [4] "language"

# Create a path to the CSV file with information
# about the relationships between the projects
# and their contributors
projects_and_contributors_csv <-
system.file("examples/projects_and_contributors.csv",
package = "DiagrammeR")

stringsAsFactors = FALSE))
#> [1] "project_name"  "contributor_name"
#> [3] "contributor_role"  "commits"

# Create the property graph by adding the CSV data to a
# new graph; the `add_nodes_from_csv()` and
# `add_edges_from_csv()` functions are used to create
# nodes and edges in the graph
graph <-
create_graph() %>%
set_graph_name("software_projects") %>%
set_global_graph_attrs(
"graph", "output", "visNetwork") %>%
contributors_csv,
set_type = "person",
label_col = "name") %>%
projects_csv,
set_type = "project",
label_col = "project") %>%
projects_and_contributors_csv,
from_col = "contributor_name",
from_mapping = "name",
to_col = "project_name",
to_mapping = "project",
rel_col = "contributor_role")

# View the graph
graph %>% render_graph

# Get the average age of all the contributors
graph %>%
select_nodes("type", "person") %>%
cache_node_attrs_ws("age", "numeric") %>%
get_cache %>%
mean
#> [1] 33.6

# Get the total number of commits to all software
# projects
graph %>%
select_edges %>%
cache_edge_attrs_ws("commits", "numeric") %>%
get_cache %>%
sum
#> [1] 5182

# Get total number of commits from Josh as a maintainer
# and a contributor
graph %>%
select_nodes("name", "Josh") %>%
trav_out_edge %>%
cache_edge_attrs_ws("commits", "numeric") %>%
get_cache %>%
sum
#> [1] 227

# Get total number of commits from Louisa
graph %>%
select_nodes("name", "Louisa") %>%
trav_out_edge %>%
cache_edge_attrs_ws("commits", "numeric") %>%
get_cache %>%
sum
#> [1] 615

# As a bit of an aside, we can use selections and
# rescale values to a styling attribute such as
# edge width, node size, or color. Select all
# edges and apply an edge `width` attribute scaled
# by the edge attribute `commits` to a range of
# 0.5 to 3.0
graph_scale_width_edges <-
graph %>%
select_edges %>%
rescale_edge_attrs_ws(
"commits", "width", 0.5, 3.0)

# Inspect the graph's internal EDF
get_edge_df(graph_scale_width_edges)
#>   from to         rel commits width
#> 1    2 11  maintainer     236  0.75
#> 2    1 11 contributor     121 0.627
#> 3    3 11 contributor      32 0.532
#> 4    2 12 contributor      92 0.596
#> 5    4 12 contributor     124  0.63
#> 6    5 12  maintainer    1460 2.059
#> 7    4 13  maintainer     103 0.608
#> 8    6 13 contributor     236  0.75
#> 9    7 13 contributor     126 0.633
#> 10   8 13 contributor    2340     3
#> 11   9 13 contributor       2   0.5
#> 12  10 13 contributor      23 0.522
#> 13   2 13 contributor     287 0.805

# View the graph, larger edges and arrows
# indicate higher numbers of `commits`
graph_scale_width_edges %>% render_graph

# Select all edges and apply a color attribute based
# on another edge attribute
graph_scale_color_edges <-
graph %>%
select_edges %>%
rescale_edge_attrs_ws(
"commits", "color", "gray95", "gray5")

# Render the graph, darker edges represent higher
# commits
graph_scale_color_edges %>% render_graph

# Get the names of people in graph above age 32
graph %>%
select_nodes("type", "person") %>%
select_nodes("age", ">32", "intersect") %>%
cache_node_attrs_ws("name") %>%
get_cache %>%
sort
#> [1] "Jack"   "Jon"    "Kim"    "Roger"  "Sheryl"

# Get the total number of commits from all people to
# the `supercalc` project
graph %>%
select_nodes("project", "supercalc") %>%
trav_in_edge %>%
cache_edge_attrs_ws("commits", "numeric") %>%
withdraw_values %>%
sum
#> [1] 1676

# Who committed the most to the `supercalc` project?
graph %>%
select_nodes("project", "supercalc") %>%
trav_in_edge %>%
cache_edge_attrs_ws("commits", "numeric") %>%
trav_in_node %>%
trav_in_edge("commits", max(get_cache(.))) %>%
trav_out_node %>%
cache_node_attrs_ws("name") %>%
get_cache
#> [1] "Sheryl"

# What is the email address of the individual that
# contributed the least to the `randomizer` project?
graph %>%
select_nodes("project", "randomizer") %>%
trav_in_edge %>%
cache_edge_attrs_ws("commits", "numeric") %>%
trav_in_node %>%
trav_in_edge("commits", min(get_cache(.))) %>%
trav_out_node %>%
cache_node_attrs_ws("email") %>%
get_cache
#> [1] "the_will@graphymail.com"

# Update the graph, because, it has come to our
# attention that Kim is now a contributor to
# `stringbuildeR` and has made 15 new commits to
# that project
graph %<>%
get_nodes(.,
"name", "Kim"),
get_nodes(.,
"project", "stringbuildeR"),
"contributor") %>%
select_last_edge %>%
set_edge_attrs_ws("commits", 15) %>%
clear_selection

# View the graph's internal EDF, the newest
# edge is at the bottom
get_edge_df(graph)
#>    from to         rel commits
#> 1     2 11  maintainer     236
#> 2     1 11 contributor     121
#> 3     3 11 contributor      32
#> 4     2 12 contributor      92
#> 5     4 12 contributor     124
#> 6     5 12  maintainer    1460
#> 7     4 13  maintainer     103
#> 8     6 13 contributor     236
#> 9     7 13 contributor     126
#> 10    8 13 contributor    2340
#> 11    9 13 contributor       2
#> 12   10 13 contributor      23
#> 13    2 13 contributor     287
#> 14    8 11 contributor      15

# View the graph to see the new edge
graph %>% render_graph

# Get all email addresses to contributors (but not
# maintainers) of the `randomizer` and `supercalc`
# projects
graph %>%
select_nodes("project", "randomizer") %>%
select_nodes("project", "supercalc") %>%
trav_in_edge("rel", "contributor") %>%
trav_out_node %>%
cache_node_attrs_ws("email", "character") %>%
get_cache
#> [1] "lhe99@mailing-fun.com"  "josh_ch@megamail.kn"
#> [3] "roger_that@whalemail.net"  "the_simone@a-q-w-o.net"
#> [5] "kim_3251323@ohhh.ai"  "the_will@graphymail.com"
#> [7] "j_2000@ultramail.io"

# Which committer to the `randomizer` project has the
# highest number of followers?
graph %>%
select_nodes("project", "randomizer") %>%
trav_in %>%
cache_node_attrs_ws("follower_count", "numeric") %>%
select_nodes("project", "randomizer") %>%
trav_in("follower_count", max(get_cache(.))) %>%
cache_node_attrs_ws("name") %>%
get_cache
#> [1] "Kim"

# Which people have committed to more than one
# project?
graph %>%
select_nodes_by_degree("out", ">1") %>%
cache_node_attrs_ws("name") %>%
get_cache %>%
sort
#> [1] "Josh"  "Kim"  "Louisa"``````