Imagine diving into a graph and moving across the graph’s nodes, jumping onto an edge, perhaps bypassing those edges and simply alighting to different nodes with specific attributes. Traversals are quite important as part of a graph query. You can develop sophisticated pipelines that allow for selective movement across the graph (based on conditions you specify per traversal) and the gleaning of information from nodes and edges. Importantly, traversals begin with selections of nodes or edges and the act of traversing modifies the selection of nodes or edges. One may select a single node, for instance, perform one or more traversals away from that initial node, and perhaps create a selection of several different nodes (or even edges). There are many important use cases, so, an in-depth primer of DiagrammeR’s traversal functions is provided alongside numerous practical examples.
Traversals Across Nodes
To traverse across connected nodes without regard to the properties
of the edges between the nodes, three functions are available:
trav_out()
, trav_in()
, and
trav_both()
. These types of traversals always require an
initial selection of one or more nodes, and, after traversing, a
selection of one or more nodes is returned.
Directionality of the traversal is the key differentiator between
these three functions. The trav_out()
function allows for
traversals to connected nodes that are outbound nodes in relation to the
origin nodes (in a directed graph). With the trav_in()
function, the movement is reversed: traversals towards connected nodes
are to inbound nodes. For example, take the edge described by
1->2
and the origin node is the node with ID
1
; the trav_out()
function would change the
node selection from node 1
to node 2
because
these nodes are adjacent to each other and the edge leads from the
origin node to an outbound node. If node 1
has outbound
edges to other nodes (e.g., 1->{2,3,4}
) then all of
those nodes connected to outbound edges of the origin node will be part
of the new selection. Take another example with a central node as the
selected node, and that node has both outbound and inbound edges to
adjacent nodes: {2,3,4}->1->{5,6,7}
. Should the
function trav_in()
be used, then nodes 2
,
3
, and 4
will become the selected nodes; using
trav_out()
will result in nodes 5
,
6
, and 7
becoming the selected nodes. Here are
several examples of traversals across nodes.
Let’s perform two types of traversals from a single node using
trav_out()
and trav_in()
. First, we’ll create
a simple graph with two nodes and an edge between them (1
-> 2
). Starting from node 1
(as the initial
selection), we traverse to node 2
.
pre <-
create_graph() %>%
add_node() %>%
add_node() %>%
add_edge(from = 1, to = 2) %>%
select_nodes_by_id(nodes = 1)
# Create a simple graph, create a single-
# node selection and traverse to the other
# node; obtain the final selection
create_graph() %>%
add_node() %>%
add_node() %>%
add_edge(from = 1, to = 2) %>%
select_nodes_by_id(nodes = 1) %>%
trav_out() %>%
get_selection()
#> [1] 2
If no traversal can occur, the selection is not altered. To demonstrate, let’s use a similar set of steps but with a graph having the opposite edge direction.
# Create a simple graph, create a single-
# node selection and then attempt to traverse
# to the other node; obtain the final selection
create_graph() %>%
add_node() %>%
add_node() %>%
add_edge(from = 2, to = 1) %>%
select_nodes_by_id(nodes = 1) %>%
trav_out() %>%
get_selection()
#> [1] 1
That type of traversal can’t alter the selection because there are no
outbound edges from node 1
, just an inbound edge
(2
-> 1
). To make a traversal possible, we
need to use to the trav_in()
function instead.
# A traversal can occur if `trav_in()` is used
# instead of `trav_out()`
create_graph() %>%
add_node() %>%
add_node() %>%
add_edge(from = 2, to = 1) %>%
select_nodes_by_id(nodes = 1) %>%
trav_in() %>%
get_selection()
#> [1] 2
Multiple traversals can be made in a single set of statements. Let’s
create a path graph containing five nodes with the
add_path()
function. We obtain a graph of the form:
1->2->3->4->5
and we can easily create an
initial selection at node 1
with the
select_nodes_by_id()
function. With several calls of
trav_out()
in succession, we can move the selection from
node 1
to node 5
.
# Traverse across a path graph one
# step at a time with `trav_out()`
create_graph() %>%
add_path(n = 5) %>%
select_nodes_by_id(nodes = 1) %>%
trav_out() %>%
trav_out() %>%
trav_out() %>%
trav_out() %>%
get_selection()
#> [1] 5
Traversals are commonly performed where an initial selection contains multiple nodes (usually, these are nodes that have something in common with each other). Let’s have a look at a slightly more complex graph and how multiple nodes in an initial selection can be migrated to a final selection after a traversal.
graph_1 <-
create_graph() %>%
add_node() %>%
select_nodes_by_id(nodes = 1) %>%
add_n_nodes_ws(
n = 5,
direction = "from"
) %>%
add_n_nodes_ws(
n = 5,
direction = "to"
)
graph_1 %>% render_graph()
We can now take the selection (still the central node 1
)
and traverse via outbound edges to adjacent nodes: 2
,
3
, 4
, 5
, and 6
.
graph_1 %>%
trav_out() %>%
get_selection()
#> [1] 2 3 4 5 6
Alternatively, with the same initial selection of 1
we
can traverse via inbound edges to adjacent nodes (with
trav_in()
) and expect a different final selection of nodes
(7
, 8
, 9
, 10
,
11
).
graph_1 %>%
trav_in() %>%
get_selection()
#> [1] 7 8 9 10 11
The trav_both()
function results in traversals to
adjacent nodes regardless of the edge directions between those nodes.
So, in a sense, the direction of movement to adjacent nodes is both in
and out, or, both. For the example of
{2,3,4}->1->{5,6,7}
, where node 1
is the
only node in the selection, all of nodes 2
through to node
6
will be part of the new selection after calling
trav_both()
.
# Create the graph described in the paragraph
# above ({`2...4`} -> `1` -> {`5...7`}),
# start from node `1` (as a selection),
# traverse to all other adjacent nodes and
# then obtain the current selection
create_graph() %>%
add_node() %>%
select_nodes_by_id(nodes = 1) %>%
add_n_nodes_ws(
n = 3,
direction = "to"
) %>%
add_n_nodes_ws(
n = 3,
direction = "from"
) %>%
trav_both() %>%
get_selection()
#> [1] 2 3 4 5 6 7
So far, these functions are described as modifying selections of
nodes based solely on node adjacency and the direction of the edges
between the adjacent nodes. Indeed without supplying values to the
function, traversals occur without regard to the attributes of the nodes
traversed to. However, the arguments node_attr
and
match
are available for filtering the traversals to those
that satisfy logical statements on numeric attributes or matches on
character attributes. For a property graph, where values are available
for all nodes’ type
attribute and all edges’
rel
attribute, a traversal with trav_out()
could, for example, be performed for all outbound, adjacent nodes that
have a specific type
label. This is done by setting
node_attr = type
and providing the value of that type for
the match
argument.
# Create a common graph with nodes having
# various `type` values; set to render
# always using `visNetwork` when calling
# `render_graph()`
graph <-
create_graph() %>%
add_node(type = "type_a") %>%
add_n_nodes(
n = 4,
type = "type_b"
) %>%
add_edge(from = 1, to = 2) %>%
add_edge(from = 1, to = 3) %>%
add_edge(from = 4, to = 1) %>%
add_edge(from = 5, to = 1) %>%
add_n_nodes(
n = 4,
type = "type_c"
) %>%
add_edge(from = 1, to = 6) %>%
add_edge(from = 1, to = 7) %>%
add_edge(from = 8, to = 1) %>%
add_edge(from = 9, to = 1)
# View the created graph
graph %>% render_graph()
graph %>%
select_nodes_by_id(nodes = 1) %>%
trav_out() %>%
get_selection()
#> [1] 2 3 6 7
graph %>%
select_nodes_by_id(nodes = 1) %>%
trav_out(conditions = type == "type_b") %>%
get_selection()
#> [1] 2 3
graph %>%
select_nodes_by_id(nodes = 1) %>%
trav_out(conditions = type == "type_c") %>%
get_selection()
#> [1] 6 7
# Once the nodes have been selected via
# a traversal, a useful thing to do would
# be to attach new nodes to that selection
updated_graph <-
graph %>%
select_nodes_by_id(nodes = 1) %>%
trav_out(conditions = type == "type_c") %>%
add_n_nodes_ws(
n = 1,
direction = "from",
type = "type_d"
)
# View the updated graph
updated_graph %>% render_graph()
We are not limited to starting a traversal from a single node ID
value. We can, for example, begin from a selection of nodes based on a
regular expression and traverse to a matching type
string
value (or to other node attributes that have character
values). The following example uses a random graph of food entities with
arbitrary edges between them.
# Create a graph with fruit, vegetables,
# and nuts
ndf <-
create_node_df(
n = 9,
type = c(
"fruit", "fruit", "fruit",
"veg", "veg", "veg",
"nut", "nut", "nut"
),
label = c(
"pineapple", "apple",
"apricot", "cucumber",
"celery", "endive",
"hazelnut", "almond",
"chestnut"
)
)
edf <-
create_edge_df(
from = c(
9, 3, 6, 2, 6,
2, 8, 2, 5, 5
),
to = c(
1, 1, 4, 3, 7,
8, 1, 5, 3, 6
)
)
graph <-
create_graph(
nodes_df = ndf,
edges_df = edf
)
# View the graph
graph %>% render_graph()
# View the internal NDF for sake of
# reference
graph %>% get_node_df()
#> id type label
#> 1 1 fruit pineapple
#> 2 2 fruit apple
#> 3 3 fruit apricot
#> 4 4 veg cucumber
#> 5 5 veg celery
#> 6 6 veg endive
#> 7 7 nut hazelnut
#> 8 8 nut almond
#> 9 9 nut chestnut
# Select all nodes with a label beginning
# with `a` and traverse outward to all nodes
graph %>%
select_nodes(
conditions = grepl("^a", label)
) %>%
trav_out() %>%
get_selection()
#> [1] 1 3 5 8
# This traversal results in a rather large
# selection of nodes: `3` (`apricot`), `8`
# (`almond`), `5` (`celery`), and `1`
# (`pineapple`)
# Now, select all nodes with a label beginning
# with `c` (in this case, the `cucumber` and
# `chestnut` and then traverse outward to any
# node of the `fruit` type
graph %>%
select_nodes(
conditions = grepl("^c", label)
) %>%
trav_out(
conditions = type == "fruit"
) %>%
get_selection()
#> [1] 1 3
# The traversal has resulted in a selection of
# nodes `3` (`apricot`) and `1` (`pineapple`)
Traversing from node to node with trav_out()
,
trav_in()
, or trav_both()
can result in a very
specific targeting of nodes. As seen, once the traversal has occurred,
the new selection can be used to obtain data from those nodes, or,
modify the graph (by adding new nodes to the selection). Especially when
used within a pipeline, the selection of nodes, the traversals, and the
resulting actions are quite readable.
Traversals from Nodes to Edges
Moving across nodes using traversal functions is quite a powerful
thing to do. However, especially with information-rich graphs, some
useful data can exist in the graph’s edges. For this reason, we can
traverse from nodes onto adjacent edges. As with the node-to-node
traversal functions, the direction of the edge is important and a key
distinction between the functions trav_out_edge()
and
trav_in_edge()
. These types of traversals always begin at
nodes (and thus require an initial selection of one or more nodes) and
typically end with a selection of one or more edges. If no traversal can
be made, then the initial selection of nodes is retained.
Starting with the trav_out_edge()
function, suppose
there is a selection of a single node 1
in the very simple
graph of 1->2
. Calling the trav_out_edge()
function in its simplest form (without values supplied except for the
graph itself) will result in an edge selection and that edge will be the
1->2
edge. Thus, the traversal is from one or more nodes
onto adjacent, outward edges. On the same graph, with the same
selection, calling the trav_in_edge()
function will not
result in a traversal (the initial node selection of node 1
will be retained, as though nothing happened). This is because the
trav_in_edge()
function performs the converse traversal,
where the traversal is from one or more nodes onto adjacent, inward
edges. Put another way, trav_in_edge()
will change the
selection to edges that point toward the initially-selected node(s), if
any.
As with the node-to-node traversal functions, these traversals are much more powerful when used with matching conditions as they increase selectivity. That only certain edges may be traversed to (and selected) is important, especially in those cases where the traversal continues onto nodes (but more on that in the next section). Examples will aid in the understanding of these functions.
# Create a simple graph with two nodes, an
# edge between them (`1` -> `2`); starting
# from node `1` (as a selection), traverse
# to the edge and then obtain the current
# selection
create_graph() %>%
add_node() %>%
add_node() %>%
add_edge(from = 1, to = 2) %>%
select_nodes_by_id(nodes = 1) %>%
trav_out_edge() %>%
get_selection()
#> [1] 1
# If no traversal can occur the selection is
# not altered. To demonstrate, use a similar
# pipeline but reverse the edge direction
create_graph() %>%
add_node() %>%
add_node() %>%
add_edge(from = 2, to = 1) %>%
select_nodes_by_id(nodes = 1) %>%
trav_out_edge() %>%
get_selection()
#> [1] 1
# A traversal can occur if `trav_in_edge()`
# is used instead of `trav_out_edge()`
create_graph() %>%
add_node() %>%
add_node() %>%
add_edge(from = 2, to = 1) %>%
select_nodes_by_id(nodes = 1) %>%
trav_in_edge() %>%
get_selection()
#> [1] 1
# A selection of multiple edges can occur
# as a result of a traversal
create_graph() %>%
add_node() %>%
select_nodes_by_id(nodes = 1) %>%
add_n_nodes_ws(
n = 10,
direction = "from"
) %>%
add_n_nodes_ws(
n = 10,
direction = "to"
) %>%
trav_out_edge() %>%
get_selection()
#> [1] 1 2 3 4 5 6 7 8 9 10
create_graph() %>%
add_node() %>%
select_nodes_by_id(nodes = 1) %>%
add_n_nodes_ws(
n = 10,
direction = "from"
) %>%
add_n_nodes_ws(
n = 10,
direction = "to"
) %>%
trav_in_edge() %>%
get_selection()
#> [1] 11 12 13 14 15 16 17 18 19 20
To introduce conditions on the traversal, we can again use the
conditions
argument. As with the node-to-node traversal
functions, these optional values induce filtering of the node-to-edge
traversals. If a graph is fashioned as a property graph that has values
set for node type
and edge rel
attributes,
traversals with trav_out_edge()
and
trav_in_edge()
be restricted to selection of edges that
have a specific rel
label.
# First, set a seed so the example
# is reproducible
suppressWarnings(RNGversion("3.5.0"))
set.seed(20)
# Create a graph with fruit,
# vegetables, nuts, and... people!
ndf <-
create_node_df(
n = 14,
type = c(
"person", "person",
"person", "person",
"person", "fruit",
"fruit", "fruit",
"veg", "veg", "veg",
"nut", "nut", "nut"
),
label = c(
"Annie", "Donna",
"Justine", "Ed",
"Graham", "pineapple",
"apple", "apricot",
"cucumber", "celery",
"endive", "hazelnut",
"almond", "chestnut"
)
)
edf <-
create_edge_df(
from = sort(
as.vector(replicate(5, 1:5))
),
to = as.vector(
replicate(5, sample(6:14, 5))
),
rel = as.vector(
replicate(
5, sample(
c(
"likes", "dislikes",
"allergic_to"
), 5,
TRUE,
c(0.5, 0.25, 0.25)
)
)
)
)
graph <-
create_graph(
nodes_df = ndf,
edges_df = edf
)
graph %>% render_graph()