Node and Edge Data Frames
These functions are used to create and manipulate specialized data frames: node data frames (NDFs) and edge data frames (EDFs). The functions are useful because one can selectively add field data to these data frames and combine them as necessary before addition to a graph object.
Creating an NDF
With the create_node_df()
function, one can create a
node data frame (NDF) with nodes and their attributes. This object is
really just an R data.frame object. In most cases, it’s recommended to
use create_node_df()
instead of data.frame()
(or, as.data.frame()
) to create an NDF. Using
create_node_df()
allows for validation of the input data
(so that the integrity of the graph is not compromised) and the function
provides some additional functionality that the base R functions for
data frame creation do not have:
- single values are repeated for n number of nodes supplied
- selective setting of attributes (e.g., giving attr values for 3 of 10 nodes, allowing non-set nodes to use defaults or globally set attr values)
- supplying overlong vectors for attributes will result in trimming down to the number of nodes
- setting
label = FALSE
will conveniently result in a non-labeled node
The function only has one argument that requires values to be supplied: the nodes argument. Just supplying a set of unique ID values for nodes will create an NDF with nodes that have no additional attributes.
When incorporated into a graph, an NDF is parsed and column names
that match keywords for node attributes indicate to
DiagrammeR that the values in such columns should
provide attribute values on a per-node basis. Columns with names that
don’t match reserved attribute names are disregarded and, because of
this, you can include columns with useful data for analysis. When
creating a node data frame, one column named nodes
will be
created (and it will be the first column in the NDF). That’s where
unique values for the node ID should reside. As for other attribute
columns, the type
and label
columns will also
be created. However, these do not necessarily need to be populated with
values and thus they can be left blank, if desired. Other attributes are
voluntarily added as named vectors for the R triple-dot
(...
) argument. Here are all of the node attribute names
and the types of values to supply:
-
color
— provide an X11 or hexadecimal color (append 2 digits to hex for alpha) -
distortion
— the node distortion for anyshape = polygon
-
fillcolor
— provide an X11 or hexadecimal color (append 2 digits to hex for alpha) -
fixedsize
—true
orfalse
-
fontcolor
— provide an X11 or hexadecimal color (append 2 digits to hex for alpha) -
fontname
— the name of the font -
fontsize
— the size of the font for the node label -
height
— the height of the node -
penwidth
— the thickness of the stroke for the shape -
peripheries
— the number of peripheries (essentially, additional shape outlines) -
shape
— the node shape (e.g.,ellipse
,polygon
,circle
, etc.) -
sides
— ifshape = polygon
, the number of sides can be provided here -
style
— usually given the value filled if you’d like to fill a node with a color -
tooltip
— provide text here for an unstyled browser tooltip -
width
— the width of the node -
x
— the x position of the node (requires graph attrlayout = neato
to use) -
y
— the y position of the node (requires graph attrlayout = neato
to use)
In the following examples, node data frames are created. There are a
few worthwhile things to notice here. The nodes can be supplied as a
character vector (as in c("a", "b", "c", "d")
), or, as a
range of integers (1:4
). It may be a matter of preference,
but the numbering system seems to be the better choice (and other
functions available in the package will take advantage of a graph with
ID values that are available as monotonically increasing integer
values). Secondly, a single logical value can be supplied to the
label
argument, where TRUE
copies the node ID
to the label attribute, and FALSE
yields a blank, unset
value for all nodes in the NDF. Finally, the provision of single values
in a call that creates more than a single node will result in all nodes
having that attribute (e.g., color = "aqua"
sets all nodes
in the NDF with the ‘aqua’ color).
# Create a node data frame
nodes_1 <-
create_node_df(
n = 4,
type = "lower",
label = c("a", "b", "c", "d"),
style = "filled",
color = "aqua",
shape = c("circle", "circle",
"rectangle", "rectangle"),
data = c(3.5, 2.6, 9.4, 2.7))
# Inspect the `nodes_1` NDF
nodes_1
#> id type label style color shape data
#> 1 1 lower a filled aqua circle 3.5
#> 2 2 lower b filled aqua circle 2.6
#> 3 3 lower c filled aqua rectangle 9.4
#> 4 4 lower d filled aqua rectangle 2.7
# Create another node data frame
nodes_2 <-
create_node_df(
n = 4,
type = "upper",
label = TRUE,
style = "filled",
color = "red",
shape = "triangle",
data = c(0.5, 3.9, 3.7, 8.2))
# Inspect the `nodes_2` NDF
nodes_2
#> id type label style color shape data
#> 1 1 upper 1 filled red triangle 0.5
#> 2 2 upper 2 filled red triangle 3.9
#> 3 3 upper 3 filled red triangle 3.7
#> 4 4 upper 4 filled red triangle 8.2
Creating an EDF
Using the create_edge_df()
function, an edge data frame
(EDF) (comprising edges and their attributes) is created. As with the
create_node_df()
function, the resulting object is an R
data.frame
object. While the usual means for creating data
frames could be used to create an EDF, the create_edge_df()
function provides some conveniences and validation of the final object
(e.g., checking for uniqueness of the supplied node ID values). Also,
there is a special attribute for a edge called a rel
, which
is short for relationship. This is an optional attribute, so, it can be
left blank. However, it’s advantageous to have these types of group
labels set for all edges, especially if the resulting graph is to be
fashioned as a property graph. (The node type
values must
also be set for each node to model your graph as a property graph.)
Edges define the connections between nodes in a graph. So, in a
sense, an edge data frame is a complementary component to the node data
frame within a graph. Therefore, the EDF contains node ID information in
two columns named from
and to
. So, when making
an edge data frame, there are two equal-length vectors that need to be
supplied to the create_edge_df()
function: one for the
outgoing node edge (from
), and, another for the incoming
node edge (to
). Each of the two columns will contain node
ID values. As for the node data frame, attributes can be provided for
each of the edges. The following edge attributes can be used:
-
arrowhead
— the arrow style at the head end (e.g, normal, dot) -
arrowsize
— the scaling factor for the arrowhead and arrowtail -
arrowtail
— the arrow style at the tail end (e.g, normal, dot) -
color
— the stroke color; an X11 color or a hex code (add 2 digits for alpha) -
dir
— the direction; either forward, back, both, or none -
fontcolor
— choose an X11 color or provide a hex code (append 2 digits for alpha) -
fontname
— the name of the font -
fontsize
— the size of the font for the node label -
headport
— a cardinal direction for where the arrowhead meets the node -
label
— label text for the line between nodes -
minlen
— minimum rank distance between head and tail -
penwidth
— the thickness of the stroke for the arrow -
tailport
— a cardinal direction for where the tail is emitted from the node -
tooltip
— provide text here for an edge tooltip
Here are a few examples of how edge data frames can be created:
# Create an edge data frame
edges_1 <-
create_edge_df(
from = c(1, 1, 2, 3),
to = c(2, 4, 4, 1),
rel = "requires",
color = "green",
data = c(2.7, 8.9, 2.6, 0.6))
edges_1
#> id from to rel color data
#> 1 1 1 2 requires green 2.7
#> 2 2 1 4 requires green 8.9
#> 3 3 2 4 requires green 2.6
#> 4 4 3 1 requires green 0.6
# Create another edge data frame
edges_2 <-
create_edge_df(
from = c(5, 7, 8, 8),
to = c(8, 8, 6, 5),
rel = "receives",
arrowhead = "dot",
color = "red")
edges_2
#> id from to rel arrowhead color
#> 1 1 5 8 receives dot red
#> 2 2 7 8 receives dot red
#> 3 3 8 6 receives dot red
#> 4 4 8 5 receives dot red
Combining NDFs
Several node data frames (NDFs) can be combined into one using the
combine_ndfs()
function. You can combine two NDFs in a
single call of combine_ndfs()
, or, combine even more in a
single pass.
There may be occasion to combine several NDFs into a single node data
frame. The combine_ndfs()
function works much like the base
R rbind()
function except that it accepts NDFs with columns
differing in number, names, and ordering. Obtaining several node data
frames may result from collecting data from various sources, at
different times (where the collected data is different), or, simply from
multiple calls of create_node_df()
for practical or
readability purposes, to name a few examples. Speaking of examples:
# Create an NDF
nodes_1 <-
create_node_df(
n = 4,
label = 1:4,
type = "lower",
data = c(8.2, 5.2, 1.2, 14.9))
# Create another NDF
nodes_2 <-
create_node_df(
n = 4,
label = 5:8,
type = "upper",
data = c(0.3, 6.3, 10.7, 1.2))
# Combine the NDFs
all_nodes <- combine_ndfs(nodes_1, nodes_2)
all_nodes
#> id type label data
#> 1 1 lower 1 8.2
#> 2 2 lower 2 5.2
#> 3 3 lower 3 1.2
#> 4 4 lower 4 14.9
#> 5 5 upper 5 0.3
#> 6 6 upper 6 6.3
#> 7 7 upper 7 10.7
#> 8 8 upper 8 1.2
Combining EDFs
There may be cases where one might take data from one or more data
frames and create multiple edge data frames. One can combine multiple
edge data frames (EDFs) with the combine_edfs()
function.
Any number of EDFs can be safely combined with one call of this
function.
Again, it is advantageous to use this combining function for the purpose of coalescing EDFs. Once a single, combined EDF is generated, it is much easier to incorporate that data into a DiagrammeR graph object.
# Create an edge data frame
edges_1 <-
create_edge_df(
from = c(1, 1, 2, 3),
to = c(2, 4, 4, 1),
rel = "requires",
color = "green",
data = c(2.7, 8.9, 2.6, 0.6))
# Create another edge data frame
edges_2 <-
create_edge_df(
from = c(5, 7, 8, 8),
to = c(8, 8, 6, 5),
rel = "receives",
arrowhead = "dot",
color = "red")
# Combine edge data frames with 'combine_edfs'
all_edges <- combine_edfs(edges_1, edges_2)
all_edges
#> id from to rel color data arrowhead
#> 1 1 1 2 requires green 2.7 <NA>
#> 2 2 1 4 requires green 8.9 <NA>
#> 3 3 2 4 requires green 2.6 <NA>
#> 4 4 3 1 requires green 0.6 <NA>
#> 5 5 5 8 receives red NA dot
#> 6 6 7 8 receives red NA dot
#> 7 7 8 6 receives red NA dot
#> 8 8 8 5 receives red NA dot