Selections

Occasionally, you'll want to operate on a select group of nodes or edges. Some functions affect a single node or edge while others (or, sometimes, the same functions) operate on all nodes or edges in a graph. Selections allow you to target specified nodes or edges and then apply specialized functions to operate on just those selected entities. Most of the selection functions support rudimentary set operations across several calls of the selection functions (i.e., for the union, intersection, or difference between selection sets of nodes or edges).

Creating a Node Selection

Selecting nodes in a graph is done by targeting a specific node attribute (e.g., type, label, styling attributes such as width, if available, or arbitrary data values available for nodes).

As with all of the select_...() functions, the graph argument is the first in the function signature. This is beneficial for forward-piping the graph in a pipeline and propogating a series of tranformations on the graph. There may be situations where several types of selections be applied to the graphs nodes or edges (say, selecting all nodes of a graph in the first step and then subtracting nodes of a specific type from that group) and this is where the pipeline paradigm becomes incredibly useful, and, as an added bonus, easy to read and to reason about.

The main handle on selectivity for the select_nodes() function is through the node attributes of the graph's internal NDF, which contains columns for all possible node attributes in the graph. By providing a single node attribute as a character value for the node_attr argument, you can select nodes by supplying a search term for the search argument. For search, there are two possible input types:

  1. a logical expression with a comparison operator (>, <, ==, or !=) followed by a number for numerical filtering
  2. a regular expression for filtering through string matching

In the first example, nodes will be selected based on a logical expression operating on a collection of numeric values (the node attribute arbitrarily named data).

###
# Select nodes based on the
# numeric values of a specified
# node attribute
###

library(DiagrammeR)

nodes <-
  create_nodes(
    nodes = 1:4,
    data = c(9.7, 8.5,
             2.2, 6.0))

graph <- create_graph(nodes_df = nodes)

# Select nodes where the `data` attribute
# has a value greater than 7.0 (it's the
# first 2 nodes)
graph <-
  select_nodes(
    graph = graph,
    node_attr = "data",
    search = ">7.0")

# Get the graph's current selection
get_selection(graph)
#> [1] "1" "2"

A selection of nodes can be obtained through a match using a regular expression operating on a collection of character-based values (the node attribute arbitrarily named fruits).

###
# Select nodes based on the text values
# of a specified node attribute
###

library(DiagrammeR)

nodes <-
  create_nodes(
    nodes = 1:4,
    fruits = c("apples", "apricots",
               "bananas", "plums"))

graph <- create_graph(nodes_df = nodes)

# Select nodes where the `fruits`
# attribute has a match on the first
# letters being `ap` (the first 2 nodes);
# The regular expression to use is `^ap`
# (where the ^ denotes the beginning of
# the text to the parsed)
graph <-
  select_nodes(
    graph = graph,
    node_attr = "fruits",
    search = "^ap")

# Get the graph's current selection
get_selection(graph)
#> [1] "1" "2"

The situation may arise when a more specialized match needs to be made (i.e., matching this but not that, or, matching two different types of things). This is where the set_op argument becomes useful. When a selection of nodes is performed using select_nodes() (or any of the other select_...() functions that operate on nodes), the selection is stored in the graph object. This is seen in the above examples where graph$selection was used to verify which nodes were in the selection. Because the selection is retained (at least until clear_selection() is called, or, a selection of edges is made), multiple uses select_nodes() can modify the set of selected nodes depending on the option provided in the set_op argument. These set operations are:

  • union — creates a union of selected nodes in consecutive operations that create a selection of nodes (this is the default option)
  • intersect — modifies the list of selected nodes such that only those nodes common to both consecutive node selection operations will retained
  • difference — modifies the list of selected nodes such that the only nodes retained are those that are different in the second node selection operation compared to the first

These set operations behave exactly as the base R functions: union(), intersect()/intersection(), and setdiff() (they are actually used internally). Furthermore, most of the select_...() functions contain the set_op argument, so, they behave the same way with regard to modifying the node or edge selection in a pipeline of selection operations. As examples are important in fully understanding how these can work for more complex selections, here are a few:

###
# Create a number of complex
# node selections using magrittr
# pipelines and different types
# of set operations
###

library(DiagrammeR)
library(magrittr)

# Create a single graph object to be
# used for multiple node selections
nodes <-
  create_nodes(
    nodes = 1:9,
    type = c("fruit", "fruit", "fruit",
             "veg", "veg", "veg",
             "nut", "nut", "nut"),
    label = c("pineapple", "apple", "apricot",
              "cucumber", "celery", "endive",
              "hazelnut", "almond", "chestnut"),
    count = c(6, 3, 8, 7, 2, 6, 9, 9, 7))

graph <- create_graph(nodes_df = nodes)

# Inspect the graph's NDF
get_node_df(graph)
#>   nodes  type     label count
#> 1     1 fruit pineapple     6
#> 2     2 fruit     apple     3
#> 3     3 fruit   apricot     8
#> 4     4   veg  cucumber     7
#> 5     5   veg    celery     2
#> 6     6   veg    endive     6
#> 7     7   nut  hazelnut     9
#> 8     8   nut    almond    22
#> 9     9   nut  chestnut    16

# Select all foods that either begin
# with `c` or end with `e`
graph_1 <-
  graph %>%
    select_nodes(
      node_attr = "label",
      search = "^c") %>%
    select_nodes(
      node_attr = "label",
      search = "e$",
      set_op = "union")

# Get the graph's current selection
get_selection(graph)
#> "4" "5" "9" "1" "2" "6"

# Select any food beginning with `a` and
# having a count less than 5
graph_2 <-
  graph %>%
    select_nodes(
      node_attr = "label",
      search = "^a") %>%
    select_nodes(
      node_attr = "count",
      search = "<5",
      set_op = "intersect")

# Get the graph's current selection
get_selection(graph_2)
#> [1] "2"

# Select any fruit not containing `apple`
# in its name
graph_3 <-
  graph %>%
    select_nodes(
      node_attr = "type",
      search = "fruit") %>%
    select_nodes(
      node_attr = "label",
      search = "apple",
      set_op = "difference")

# Get the graph's current selection
get_selection(graph_3)
#> [1] "3"

There is an additional filtering option available as the nodes argument. Here, a vector of node ID values can be supplied and this will indicate to the function that only that subset of nodes will be considered for select_nodes(). Note that, if node_attr and search are provided with NULL values (the default) and nodes is given a vector of node ID values, it will be those very nodes that will make up the selection in this function call. While this is convenient and often a good method for selecting nodes (so long as one knows which node IDs need to be selected), the function select_nodes_by_id() handles this use case more directly (as the nodes argument is in the second position in the function signature).

###
# Select nodes from a subset of all
# available nodes in the graph
###

library(DiagrammeR)
library(magrittr)

# Create a simple graph with some
# numeric data
nodes <-
  create_nodes(
    nodes = 1:10,
    data = seq(0.5, 5, 0.5))

graph <- create_graph(nodes_df = nodes)

# Inspect the graph's NDF
get_node_df(graph)
#>    nodes type label data
#> 1      1          1  0.5
#> 2      2          2    1
#> 3      3          3  1.5
#> 4      4          4    2
#> 5      5          5  2.5
#> 6      6          6    3
#> 7      7          7  3.5
#> 8      8          8    4
#> 9      9          9  4.5
#> 10    10         10    5

# Select from a subset of nodes
# (given as `nodes = 1:6`) where
# the data value is greater than `1.5`
graph %<>%
  select_nodes(
    node_attr = "data",
    search = ">1.5",
    nodes = 1:6)

# Get the graph's current selection
get_selection(graph)
#> [1] "4" "5" "6"

Creating an Edge Selection

Selecting edges in a graph is done in a manner quite similar to selecting nodes. The primary means for targeting a specific edges is through any available edge attributes (e.g., rel, styling attributes such as color, or arbitrary data values available for edges).

The graph argument is the first in the function signature. That graph is first, forward-piping the graph in a pipeline becomes easy with magrittr or pipeR. When providing an edge attribute category as a character value for the edge_attr argument, you can select edges by supplying a search term for the search argument. There are two possible input types for search:

  1. a logical expression with a comparison operator (>, <, ==, or !=) followed by a number for numerical filtering
  2. a regular expression for filtering through string matching

The following example shows how to create selections of edges based on a logical expression operating on a collection of numeric values (the edge attribute arbitrarily named data).

###
# Select edges based on the
# numeric values of a specified
# edge attribute
###

library(DiagrammeR)

nodes <-
  create_nodes(1:4)

edges <-
  create_edges(
    from = 1:4,
    to = c(2:4, 1),
    data = c(8.6, 2.8, 6.3, 4.5))

graph <-
  create_graph(nodes, edges)

# Inspect the graph's EDF
get_edge_df(graph)
#>   from to rel data
#> 1    1  2      8.6
#> 2    2  3      2.8
#> 3    3  4      6.3
#> 4    4  1      4.5

# Select edges where the `data`
# attribute has a value
# greater than 5.0 (it's the
# edges `1 -> 2` and `3 -> 4`
graph <-
  select_edges(
    graph = graph,
    edge_attr = "data",
    search = ">5.0")

# Get the graph's current selection
get_selection(graph)
#> [1] "1->2"  "3->4"

Selecting the Last Node or Edge in an NDF or EDF

You can select the last node or edge from the graph's internal node data frame (NDF) or internal edge data frame (EDF), respectively. Usually, this will be the last node or edge created since new nodes or edges are added to the bottom of the data frame and there is no shuffling of these positions. Immediately after creating a single node or edge, calling either the select_last_node() or the select_last_edge() functions will result in a selection of the last node or edge created.

For both functions, graph is the only argument. Simply providing a graph object will make a single node selection when select_last_node() is invoked, or a single edge selection when select_last_edge() is invoked.

###
# Create a graph, select
# the last node of the graph's
# NDF, then, select the last
# edge of the graph's EDF
###

library(DiagrammeR)
library(magrittr)

nodes <-
  create_nodes(1:4)

edges <-
  create_edges(
    from = 1:4,
    to = c(2:4, 1),
    data = c(8.6, 2.8, 6.3, 4.5))

graph <-
  create_graph(nodes, edges)

# Inspect the graph's NDF
get_node_df(graph)
#>   nodes type label
#> 1     1          1
#> 2     2          2
#> 3     3          3
#> 4     4          4

# Inspect the graph's EDF
get_edge_df(graph)
#>   from to rel data
#> 1    1  2      8.6
#> 2    2  3      2.8
#> 3    3  4      6.3
#> 4    4  1      4.5

# Select the last node in the graph's NDF and confirm
# that the selection was made
graph %>%
  select_last_node %>%
  get_selection
#> [1] "4"

# Select the last edge in the graph's EDF and confirm
# that the selection was made
graph %>%
  select_last_edge %>%
  get_selection
#>[1] "4->1"

# Create a graph, node-by-node and
# edge-by-edge and add attributes
graph_2 <-
  create_graph(
    graph_attrs = "output = visNetwork") %>%
  add_node %>%
  select_last_node %>%
  set_node_attrs_ws(
    "timestamp", as.character(Sys.time())) %>%
  set_node_attrs_ws("type", "A") %>%
  clear_selection %>%
  add_node %>%
  select_last_node %>%
  set_node_attrs_ws(
    "timestamp", as.character(Sys.time())) %>%
  set_node_attrs_ws("type", "B") %>%
  add_edge(1, 2, "AB") %>%
  select_last_edge %>%
  set_edge_attrs_ws(
    "timestamp", as.character(Sys.time()))

# View the new graph
graph_2 %>% render_graph


# Inspect the new graph's NDF
graph_2 %>% get_node_df
#>   nodes type label           timestamp
#> 1     1    A     1 2016-01-22 17:00:51
#> 2     2    B     2 2016-01-22 17:00:51

# Inspect the new graph's EDF
graph_2 %>% get_edge_df
#>   from to rel           timestamp
#> 1    1  2  AB 2016-01-22 17:00:51

Selecting Nodes by ID Values


Selecting Nodes in a Neighborhood


Selecting Nodes by their Degree


Inverting and Clearing a Selection