Occasionally, you’ll want to operate on a select group of nodes or edges. Some functions affect a single node or edge while others (or, sometimes, the same functions) operate on all nodes or edges in a graph. Selections allow you to target specified nodes or edges and then apply specialized functions to operate on just those selected entities. Most of the selection functions support rudimentary set operations across several calls of the selection functions (i.e., for the union, intersection, or difference between selection sets of nodes or edges).
Creating a Node Selection
Selecting nodes in a graph is accomplished by targeting a specific
node attribute (e.g., type
, label
, styling
attributes such as width
, if available, or arbitrary data
values available for nodes).
As with all of the select_*()
functions, the graph
argument is the first in the function signature. This is beneficial for
forward-piping the graph in a pipeline and propagating a series of
transformations on the graph. There may be situations where several
types of selections be applied to the graphs nodes or edges (say,
selecting all nodes of a graph in the first step and then subtracting
nodes of a specific type from that group) and this is where the pipeline
paradigm becomes incredibly useful, and, as an added bonus, easy to read
and to reason about.
The main handle on selectivity for the select_nodes()
function is through the node attributes of the graph’s internal NDF,
which contains columns for all possible node attributes in the graph. By
providing an expression in conditions
, nodes will be placed
in the active selection where they are TRUE
. Here are two
common types of expressions that work well for the
conditions
argument:
- a logical expression with a comparison operator (
>
,<
,==
, or!=
) - a regular expression for filtering via string matching
Let’s create a simple graph with four nodes that have numeric
data
values. Then, we’ll inspect the graph’s internal NDF
to see where we are starting from.
# Create a node data frame
ndf <-
create_node_df(
n = 4,
data = c(
9.7, 8.5, 2.2, 6.0)
)
# Create a new graph based on the ndf
graph <- create_graph(nodes_df = ndf)
# Inspect the graph's NDF
graph %>% get_node_df()
#> id type label data
#> 1 1 <NA> <NA> 9.7
#> 2 2 <NA> <NA> 8.5
#> 3 3 <NA> <NA> 2.2
#> 4 4 <NA> <NA> 6.0
In the first example, nodes will be selected based on a logical
expression operating on a collection of numeric values. To examine the
result of the selection, the get_node_df_ws()
will be used.
This function will display a table for the active selection of nodes in
the graph (like get_node_df()
, but with a subset of
nodes).
# Select nodes where the `data` attribute
# has a value greater than 7.0 (it's the
# first 2 nodes)
graph <-
graph %>%
select_nodes(
conditions = data > 7.0
)
# Get the graph's current selection
# of nodes as a table
graph %>% get_node_df_ws()
#> id type label data
#> 1 1 <NA> <NA> 9.7
#> 2 2 <NA> <NA> 8.5
A selection of nodes can be obtained through a match using a regular
expression operating on a collection of character-based values (the node
attribute that is named fruits
). This uses the
grepl()
function in the expression supplied to
conditions
. The regular expression to use is
^ap
, where the ^
denotes the beginning of the
text to the parsed.
# Create a node data frame
ndf <-
create_node_df(
n = 4,
fruits = c(
"apples", "apricots",
"bananas", "plums")
)
# Create a new graph based on the ndf
graph <- create_graph(nodes_df = ndf)
# Select nodes where the `fruits`
# attribute has a match on the first
# letters being `ap` (the first 2 nodes)
graph <-
graph %>%
select_nodes(
conditions = grepl("^ap", fruits)
)
# Get the graph's current selection
# of nodes as a table
graph %>% get_node_df_ws()
#> id type label fruits
#> 1 1 <NA> <NA> apples
#> 2 2 <NA> <NA> apricots
The situation may arise when a more specialized match needs to be
made (i.e., matching this but not that, or, matching two different types
of things). This is where the set_op
argument can become
useful. When a selection of nodes is obtained using
select_nodes()
(or any of the other select_*()
functions that operate on nodes), the selection is stored in the graph
object. This is seen in the above examples where
get_selection()
was used to verify which nodes were in the
selection. Because the selection is retained (at least until
clear_selection()
is called, or, a selection of edges is
made), multiple uses select_nodes()
can modify the set of
selected nodes depending on the option provided in the
set_op
argument. These set operations are:
-
union
: creates a union of selected nodes in consecutive operations that create a selection of nodes (this is the default option) -
intersect
: modifies the list of selected nodes such that only those nodes common to both consecutive node selection operations will retained -
difference
: modifies the list of selected nodes such that the only nodes retained are those that are different in the second node selection operation compared to the first
These set operations behave exactly as the base R functions:
union()
,
intersect()
/intersection()
, and
setdiff()
(which are actually used internally).
Furthermore, most of the select_*()
functions contain the
set_op
argument, so, they behave the same way with regard
to modifying the node or edge selection in a pipeline of selection
operations. As examples are important in fully understanding how these
can work for more complex selections, quite a few will be provided
here.
The example graph will now be a bit more complex. It will contain 9
nodes, of three different type
s (fruit
,
veg
, and nut
). The label
node
attribute has the name of the food, and the count
attribute
contains arbitrary numeric values.
# Create a node data frame
ndf <-
create_node_df(
n = 9,
type = c(
"fruit", "fruit", "fruit",
"veg", "veg", "veg",
"nut", "nut", "nut"),
label = c(
"pineapple", "apple", "apricot",
"cucumber", "celery", "endive",
"hazelnut", "almond", "chestnut"),
count = c(
6, 3, 8, 7, 2, 6, 9, 9, 7)
)
# Create a new graph based on the ndf
graph <- create_graph(nodes_df = ndf)
# Inspect the graph's NDF
graph %>% get_node_df()
#> id type label count
#> 1 1 fruit pineapple 6
#> 2 2 fruit apple 3
#> 3 3 fruit apricot 8
#> 4 4 veg cucumber 7
#> 5 5 veg celery 2
#> 6 6 veg endive 6
#> 7 7 nut hazelnut 9
#> 8 8 nut almond 9
#> 9 9 nut chestnut 7
Let’s successively use two select_nodes()
calls to
select all those foods that either begin with c
or end with
e
. Because this is an OR set, we will use the
union
set operation in the second call of
select_nodes()
.
# Select all foods that either begin
# with `c` or ending with `e`
graph_1 <-
graph %>%
select_nodes(
conditions = grepl("^c", label)
) %>%
select_nodes(
conditions = grepl("e$", label),
set_op = "union"
)
# Get the graph's current selection
# of nodes as a table
graph_1 %>% get_node_df_ws()
#> id type label count
#> 1 1 fruit pineapple 6
#> 2 2 fruit apple 3
#> 3 4 veg cucumber 7
#> 4 5 veg celery 2
#> 5 6 veg endive 6
#> 6 9 nut chestnut 7
The conditions
don’t need to be related. In the first
below, it’s a matching expression using grepl()
. The second
is filtering to those nodes with count < 5
.
# Select any food beginning with `a` and
# having a count less than 5
graph_2 <-
graph %>%
select_nodes(
conditions = grepl("^a", label)
) %>%
select_nodes(
conditions = count < 5,
set_op = "intersect"
)
# Get the graph's current selection
# of nodes as a table
graph_2 %>% get_node_df_ws()
#> id type label count
#> 1 2 fruit apple 3
The following example contains a pair of select_nodes()
statements, where the first filters nodes to a fruit
group
(type == "fruit"
) and then excludes a subset of these nodes
(any fruit containing “apple” in the name) using the
set_op = "difference"
.
# Select any fruit not containing
# `apple` in its name
graph_3 <-
graph %>%
select_nodes(
conditions = type == "fruit"
) %>%
select_nodes(
conditions = grepl("apple", label),
set_op = "difference"
)
# Get the graph's current selection
# of nodes as a table
graph_3 %>% get_node_df_ws()
#> id type label count
#> 1 3 fruit apricot 8
There is an additional filtering option available as the
nodes
argument. Here, a vector of node ID values can be
supplied and this will indicate to the function that only that subset of
nodes will be considered for select_nodes()
. Note that, if
nothing is provided in conditions
and nodes
is
given a vector of node ID values, it will be those very nodes that will
make up the selection in this function call. While this is convenient
and often a good method for selecting nodes (so long as one knows which
node IDs need to be selected), the function
select_nodes_by_id()
handles this use case more directly
(as it only filters based on its nodes
argument).
# Create a node data frame
ndf <-
create_node_df(
n = 10,
data = seq(0.5, 5, 0.5)
)
# Create a new graph based on the ndf
graph <- create_graph(nodes_df = ndf)
# Inspect the graph's NDF
graph %>% get_node_df()
#> id type label data
#> 1 1 <NA> <NA> 0.5
#> 2 2 <NA> <NA> 1.0
#> 3 3 <NA> <NA> 1.5
#> 4 4 <NA> <NA> 2.0
#> 5 5 <NA> <NA> 2.5
#> 6 6 <NA> <NA> 3.0
#> 7 7 <NA> <NA> 3.5
#> 8 8 <NA> <NA> 4.0
#> 9 9 <NA> <NA> 4.5
#> 10 10 <NA> <NA> 5.0
Let’s now perform a call to select_nodes()
that contains
both an expression for the conditions
argument, and, a
range of node ID values in the nodes
argument.
# Select from a subset of nodes
# (given as `nodes = 1:6`) where
# the `data` value is greater than `1.5`
graph <-
graph %>%
select_nodes(
conditions = data > 1.5,
nodes = 1:6
)
# Get the graph's current selection
# of nodes as a table
graph %>% get_node_df_ws()
#> id type label data
#> 1 4 <NA> <NA> 2.0
#> 2 5 <NA> <NA> 2.5
#> 3 6 <NA> <NA> 3.0
Creating an Edge Selection
Selecting edges in a graph is done in a manner quite similar to
selecting nodes. The primary means for targeting a specific edges is
through any available edge attributes (e.g., rel
, styling
attributes such as color
, or arbitrary data values
available for edges).
We will start by creating a graph with four nodes and four edges. The
edges will have an edge attribute called data
that contains
numeric values.
# Create a node data frame
ndf <- create_node_df(n = 4)
# Create an edge data frame
edf <-
create_edge_df(
from = c(1, 2, 3, 4),
to = c(2, 3, 4, 1),
data = c(
8.6, 2.8, 6.3, 4.5)
)
# Create a new graph from
# the NDF and EDF
graph <-
create_graph(
nodes_df = ndf,
edges_df = edf
)
# Inspect the graph's EDF
graph %>% get_edge_df()
#> id from to rel data
#> 1 1 1 2 <NA> 8.6
#> 2 2 2 3 <NA> 2.8
#> 3 3 3 4 <NA> 6.3
#> 4 4 4 1 <NA> 4.5
The following example shows how to create selections of edges based
on a logical expression operating on a collection of numeric values in
the data
attribute.
# Select edges where the `data`
# attribute has a value
# greater than 5.0
graph <-
graph %>%
select_edges(
conditions = data > 5.0
)
# Get the graph's current selection
# of edges as a table
graph %>% get_edge_df_ws()
#> id from to rel data
#> 1 1 1 2 <NA> 8.6
#> 2 3 3 4 <NA> 6.3
Selecting the Last Node or Edge in an NDF or EDF
You can select the last node or edge from the graph’s internal node
data frame (NDF) or internal edge data frame (EDF), respectively.
Usually, this will be the last node or edge created since new nodes or
edges are added to the bottom of the data frame and there is no
shuffling of these positions. Immediately after creating a single node
or edge, calling either the select_last_node()
or the
select_last_edge()
functions will result in a selection of
the last node or edge created.
For both functions, graph
is the only argument. If we
provide a graph object to select_last_nodes_created()
we
will get an active selection of nodes (the last nodes that were added to
the graph object). Likewise, the
select_last_edges_created()
will create an active selection
of edges, whichever edges were last defined in a function call.
To begin, we’ll use the same graph as before but we’ll clear any
active selection using the clear_selection()
function. To
ensure that there is no active selection of nodes or edges, we can use
the get_selection()
function. A value of NA
means that there is neither a selection of nodes nor edges.
# Clear the graph's selection
graph <-
graph %>%
clear_selection()
# Check whether there is still
# a selection present
graph %>% get_selection()
#> [1] NA
If we use select_last_nodes_created
on this graph, we
find that the active selection of nodes contains all the nodes in the
graph, as they were all produced in the create_graph()
function (and no nodes were added in subsequent operations).
# Select the last node in the graph's NDF and confirm
# that the selection was made
graph %>%
select_last_nodes_created() %>%
get_node_df_ws()
#> id type label
#> 1 1 <NA> <NA>
#> 2 2 <NA> <NA>
#> 3 3 <NA> <NA>
#> 4 4 <NA> <NA>
The same goes for the edges. Using
select_last_edges_created()
will make a selection of edges
that includes all edges in the graph.
# Select the last edge in the graph's EDF and confirm
# that the selection was made
graph %>%
select_last_edges_created() %>%
get_edge_df_ws()
#> id from to rel data
#> 1 1 1 2 <NA> 8.6
#> 2 2 2 3 <NA> 2.8
#> 3 3 3 4 <NA> 6.3
#> 4 4 4 1 <NA> 4.5
Where these functions become useful is during the building up of a
graph in multiple steps. In this next example, we will start with an
empty graph (using create_graph()
by itself) then add a
single node with add_node()
, and then select that very node
with select_last_nodes_created()
. This
selection-in-a-pipeline approach makes it easy to set node attributes
because we can use the set_node_attrs_ws()
function. In
this example, we’re adding a new timestamp
attribute and
assigning a value (the system time). Because the selection is
persistent, we can set more attributes in further calls to
set_node_attrs_ws()
. When we are done with this node
selection, we call the clear_selection()
function and
return to a clean slate of no active selection present (it removes any
node or edge selection from the graph object). The example proceeds to
create another new node and adding attributes, and then a new edge and
edge attributes (with set_edge_attrs_ws()
). This is all
facilitated with the select_last_nodes_created()
and
select_last_edges_created()
functions.
# Create a graph, node-by-node and
# edge-by-edge and add attributes
graph_2 <-
create_graph() %>%
add_node() %>%
select_last_nodes_created() %>%
set_node_attrs_ws(
node_attr = "timestamp",
value = as.character(Sys.time())
) %>%
set_node_attrs_ws(
node_attr = "type",
value = "A"
) %>%
clear_selection() %>%
add_node() %>%
select_last_nodes_created() %>%
set_node_attrs_ws(
node_attr = "timestamp",
value = as.character(Sys.time())
) %>%
set_node_attrs_ws(
node_attr = "type",
value = "B"
) %>%
add_edge(
from = 1,
to = 2,
rel = "AB"
) %>%
select_last_edges_created() %>%
set_edge_attrs_ws(
edge_attr = "timestamp",
value = as.character(Sys.time())
) %>%
clear_selection()
# View the new graph
graph_2 %>% render_graph()
The graph shows two nodes connected together. Nothing more, nothing less. The more interesting views of the data are of the node and edge data frames, which now have several attributes set. Let’s have a look at the graph’s internal node data frame…
# Inspect the new graph's NDF
graph_2 %>% get_node_df()
#> id type label timestamp
#> 1 1 A <NA> 2024-02-03 18:48:36.40223
#> 2 2 B <NA> 2024-02-03 18:48:36.426155
…and let’s inspect the graph’s internal edge data frame.
# Inspect the new graph's EDF
graph_2 %>% get_edge_df()
#> id from to rel timestamp
#> 1 1 1 2 AB 2024-02-03 18:48:36.444785
As can be seen, immediately invoking
select_last_nodes_created()
or
select_last_edges_created()
after addition of new nodes or
edges can be useful for working with the newly made nodes/edges. Many
functions ending with _ws()
operate specifically on
selections of nodes or edges.