Creating Simple Graphs from NDFs/EDFs • DiagrammeR

Creating a graph object is undoubtedly important. I dare say it is one of the fundamental aspects of the DiagrammeR world. With the graph object produced, so many other things are possible. For instance, you can inspect certain aspects of the graph, modify the graph in many ways that suit your workflow, view the graph (or part of the graph!) in the RStudio Viewer, or perform graph traversals and thus create complex graph queries. The possibilities are really very exciting and it all begins with creating those graph objects.

Creating a Graph Object

The create_graph() function creates a graph object. The function also allows for initialization of the graph name, setting of the node_df and of the edge_df, and the graph theme.

Some of the key options of the create_graph() function are:

nodes_df — optional data frame with the graph’s nodes (or vertices) and attributes for each
edges_df — optional data frame with edges between nodes/vertices and attributes for each
directed — a required logical value stating whether the graph should be considered a directed graph (TRUE, the default) or an undirected graph (FALSE)
graph_name — optional character vector with a name for the graph

For the nodes_df and edges_df arguments, one can supply a node data frame and an edge data frame, respectively. The dgr_graph object can be initialized without any nodes or edges (by not supplying an NDF or an EDF in the function call), and this is a favorable option when supplying nodes and edges using other functions that modify an existing graph. Here is an example whereby an empty graph (initialized as a directed graph) is created. Note that the graph’s internal nodes_df and edges_df data frames are both empty here, signifying an empty graph.

# Create the graph object
graph <- create_graph()

# Get the class of the object
class(graph)
#> [1] "dgr_graph"

# It's an empty graph, so the NDF has no rows
get_node_df(graph)
#> [1] id    type  label
#> <0 rows> (or 0-length row.names)

# The EDF doesn't have any rows either
get_edge_df(graph)
#> [1] id   from to   rel 
#> <0 rows> (or 0-length row.names)

# By default, the graph is considered directed
is_graph_directed(graph)
#> [1] TRUE

It’s possible to include an NDF and not an EDF when calling create_graph(). What you would get is an edgeless graph (a graph with nodes but no edges between those nodes. The edges can always be defined later (with functions such as add_edge(), add_edge_df(), add_edges_from_table(), etc., and these functions are covered in a subsequent section).

# Create a node data frame
ndf <-
  create_node_df(
    n = 4,
    label = 1:4,
    type  = "lower",
    style = "filled",
    color = "aqua",
    shape = c("circle", "circle",
              "rectangle", "rectangle"),
    data = c(3.5, 2.6, 9.4, 2.7)
  )

# Inspect the NDF
ndf
#>   id  type label  style color     shape data
#> 1  1 lower     1 filled  aqua    circle  3.5
#> 2  2 lower     2 filled  aqua    circle  2.6
#> 3  3 lower     3 filled  aqua rectangle  9.4
#> 4  4 lower     4 filled  aqua rectangle  2.7

# Create the graph and include the
# `nodes` NDF
graph <- create_graph(nodes_df = ndf)

# Examine the NDF within the graph object
get_node_df(graph)
#>   id  type label  style color     shape data
#> 1  1 lower     1 filled  aqua    circle  3.5
#> 2  2 lower     2 filled  aqua    circle  2.6
#> 3  3 lower     3 filled  aqua rectangle  9.4
#> 4  4 lower     4 filled  aqua rectangle  2.7

# Check if it's the same NDF (both externally
# and internally)
all(ndf == graph %>% get_node_df())
#> [1] TRUE

Quite often, there will be cases where node or edge attributes should be applied to all nodes or edges in the graph. To achieve this, there’s no need to create columns in NDFs or EDFs for those attributes (where you would repeat attribute values through all rows of those columns). Default graph attributes can be provided for the graph with the graph_attrs, node_attrs, and edge_attrs arguments. To supply these attributes, use vectors of graph, node, or edge attributes.

If you want the graph to be a directed graph, then the value for the directed argument should be set as TRUE (which is the default value). Choose FALSE for an undirected graph.

This next example will include both nodes and edges contained within a graph object. In this case, values for the type and rel attributes for nodes and edges, respectively, were provided. Adding values for those attributes is optional but will be important for any data modeling work.

###
# Create a graph with both nodes and edges
# defined, and, add some default attributes
# for nodes and edges
###

# Create a node data frame
ndf <-
  create_node_df(
    n = 4,
    label = c("a", "b", "c", "d"),
    type  = "lower",
    style = "filled",
    color = "aqua",
    shape = c("circle", "circle",
              "rectangle", "rectangle"),
    data = c(3.5, 2.6, 9.4, 2.7)
  )

edf <-
  create_edge_df(
    from = c(1, 2, 3),
    to   = c(4, 3, 1),
    rel  = "leading_to"
  )

graph <-
  create_graph(
    nodes_df = ndf,
    edges_df = edf
  ) %>%
  set_node_attrs(
    node_attr = "fontname",
    values = "Helvetica"
  ) %>%
  set_edge_attrs(
    edge_attr = "color",
    values = "blue"
  ) %>%
  set_edge_attrs(
    edge_attr = "arrowsize",
    values = 2
  )

# Examine the NDF within the graph object
get_node_df(graph)
#>   id  type label  style color     shape data  fontname
#> 1  1 lower     a filled  aqua    circle  3.5 Helvetica
#> 2  2 lower     b filled  aqua    circle  2.6 Helvetica
#> 3  3 lower     c filled  aqua rectangle  9.4 Helvetica
#> 4  4 lower     d filled  aqua rectangle  2.7 Helvetica

# Have a look at the graph's EDF
get_edge_df(graph)
#>   id from to        rel color arrowsize
#> 1  1    1  4 leading_to  blue         2
#> 2  2    2  3 leading_to  blue         2
#> 3  3    3  1 leading_to  blue         2

Viewing a Graph Object

With the render_graph() function, it’s possible to view the graph object, or, output the DOT code for the current state of the graph.

Let’s have a look at the graph created in the last example:

graph %>% render_graph()

If you’d like to return the Graphviz DOT code (to, perhaps, share it or use it directly with the Graphviz command-line utility), we can use the generate_dot() function. Here’s a simple example:

# Take the graph object and generate a character
# vector with Graphviz DOT code (using cat() for
# a better appearance)
graph %>%
  generate_dot() %>%
  cat()
#> digraph {
#> 
#> graph [layout = 'neato',
#>        outputorder = 'edgesfirst',
#>        bgcolor = 'white']
#> 
#> node [fontname = 'Helvetica',
#>       fontsize = '10',
#>       shape = 'circle',
#>       fixedsize = 'true',
#>       width = '0.5',
#>       style = 'filled',
#>       fillcolor = 'aliceblue',
#>       color = 'gray70',
#>       fontcolor = 'gray50']
#> 
#> edge [fontname = 'Helvetica',
#>      fontsize = '8',
#>      len = '1.5',
#>      color = 'gray80',
#>      arrowsize = '0.5']
#> 
#>   '1' [label = 'a', style = 'filled', color = 'aqua', shape = 'circle', fontname = 'Helvetica'] 
#>   '2' [label = 'b', style = 'filled', color = 'aqua', shape = 'circle', fontname = 'Helvetica'] 
#>   '3' [label = 'c', style = 'filled', color = 'aqua', shape = 'rectangle', fontname = 'Helvetica'] 
#>   '4' [label = 'd', style = 'filled', color = 'aqua', shape = 'rectangle', fontname = 'Helvetica'] 
#> '1'->'4' [color = 'blue', arrowsize = '2'] 
#> '2'->'3' [color = 'blue', arrowsize = '2'] 
#> '3'->'1' [color = 'blue', arrowsize = '2'] 
#> }