Graph Creation

Creating a graph object is undoubtedly important. I dare say it is one of the fundamental aspects of the DiagrammeR world. With the graph object produced, so many other things are possible. For instance, you can inspect certain aspects of the graph, modify the graph in many ways that suit your workflow, view the graph (or part of the graph!) in the RStudio Viewer, or perform graph traversals and thus create complex graph queries using magrittr or pipeR pipelines. The possibilities are really very exciting and it all begins with creating those graph objects.

Creating a Graph Object

The create_graph() function creates a graph object. The function also allows for intialization of the graph name, the graph time (as a time with an optional time zone included), and any default attributes for the graph (i.e., graph, node, or edge attributes).

The components of the created graph object are:

  • graph_name — optional character vector with a name for the graph
  • graph_time — optional character vector that's date and/or time
  • graph_tz — optional character vector with the time zone for graph_time
  • nodes_df — optional data frame with the graph's nodes (or vertices) and attributes for each
  • edges_df — optional data frame with edges between nodes/vertices and attributes for each
  • graph_attrs — optional character vector of attributes pertaining to the entire graph
  • node_attrs — optional character vector of attributes pertaining to the nodes of the graph
  • edge_attrs — optional character vector of attributes pertaining to the edges of the graph
  • directed — a required logical value stating whether the graph should be considered a directed graph (TRUE, the default) or an undirected graph (FALSE)
  • dot_code — an optional character vector containing the automatically generated Graphviz DOT code for the graph

These components for the dgr_graph graph object are always present, and always in the specified order, however, the optional components may have NULL values if they are not set (e.g., an edgeless graph will have edges_df returning a NULL). To access any of these components directly for a graph named graph, simply use the construction graph$[component] (so, enter graph$nodes_df into the R console to examine the graph's NDF). In forthcoming examples, this type of inspection will be used to reveal the contents of created graph objects, however, there are convenience functions (covered later) that directly return certain graph components without need for the $ operator.

For the nodes_df and edges_df arguments, one can supply a node data frame and an edge data frame, respectively. The dgr_graph object can be initialized wtihout any nodes or edges (by not supplying an NDF or an EDF in the function call), and this is a favorable option when supplying nodes and edges using other functions that modify an existing graph. Here is an example whereby an empty graph (initialized as a directed graph) is created. Note that the nodes_df and edges_df data frames are NULL, signifying an empty graph.

###
# Create an empty graph
###

library(DiagrammeR)

# Create the graph object
graph <- create_graph()

# Get the class of the object
class(graph)
#> [1] "dgr_graph"

# It's an empty graph, so no NDF
# or EDF
get_node_df(graph)
#> NULL

get_edge_df(graph)
#> NULL

# By default, the graph is
# considered as directed
is_graph_directed(graph)
#> [1] TRUE

It's possible to include an NDF and not an EDF when calling create_graph. What you would get is an edgeless graph (a graph with nodes but no edges between those nodes. This may be somewhat silly, but edges can always be defined later (with functions such as add_edge(), add_edge_df(), add_edges_from_table(), etc., and these functions are covered in a subsequent section).

###
# Create a graph with nodes but no edges
###

library(DiagrammeR)

# Create an NDF
nodes <-
  create_nodes(
    nodes = 1:4,
    label = FALSE,
    type = "lower",
    style = "filled",
    color = "aqua",
    shape = c("circle", "circle",
              "rectangle", "rectangle"),
    data = c(3.5, 2.6, 9.4, 2.7))

# Examine the NDF
nodes
#>   nodes  type label  style color     shape data
#> 1     1 lower       filled  aqua    circle  3.5
#> 2     2 lower       filled  aqua    circle  2.6
#> 3     3 lower       filled  aqua rectangle  9.4
#> 4     4 lower       filled  aqua rectangle  2.7

# Create the graph and include the
# `nodes` NDF
graph <- create_graph(nodes_df = nodes)

# Examine the NDF within the graph object
get_node_df(graph)
#>   nodes  type label  style color     shape data
#> 1     1 lower       filled  aqua    circle  3.5
#> 2     2 lower       filled  aqua    circle  2.6
#> 3     3 lower       filled  aqua rectangle  9.4
#> 4     4 lower       filled  aqua rectangle  2.7

# It's the same NDF (outside and inside the graph)
all(nodes == graph$nodes_df)
#> [1] TRUE

Alternatively, an EDF can be supplied without need to supply an NDF (in which case the node ID values will be inferred but no node attributes will be available).

Quite often, there will be cases where node or edge attributes should be applied to all nodes or edges in the graph. To achieve this, there's no need to create columns in NDFs or EDFs for those attributes (where you would repeat attribute values through all rows of those columns). Default graph attributes can be provided for the graph with the graph_attrs, node_attrs, and edge_attrs arguments. To supply these attributes, use vectors of graph, node, or edge attributes.

If you want the graph to be a directed graph, then the value for the directed argument should be set as TRUE (which is the default value). Choose FALSE for an undirected graph.

This next example will include both nodes and edges contained within a graph object. In this case, values for the type and rel attributes for nodes and edges, respectively, were provided. Adding values for those attributes is optional but will be important for any data modelling work.

###
# Create a graph with both nodes and edges
# defined, and, add some default attributes
# for nodes and edges
###

library(DiagrammeR)

# Create a node data frame
nodes <-
  create_nodes(
    nodes = c("a", "b", "c", "d"),
    label = FALSE,
    type = "lower",
    style = "filled",
    color = "aqua",
    shape = c("circle", "circle",
              "rectangle", "rectangle"),
    data = c(3.5, 2.6, 9.4, 2.7))

edges <-
  create_edges(
    from = c("a", "b", "c"),
    to = c("d", "c", "a"),
    rel = "leading_to")

graph <-
  create_graph(
    nodes_df = nodes,
    edges_df = edges,
    node_attrs = "fontname = Helvetica",
    edge_attrs = c("color = blue",
                   "arrowsize = 2"))

# Examine the NDF within the
# graph object
get_node_df(graph)
#>   nodes  type label  style color     shape data
#> 1     a lower       filled  aqua    circle  3.5
#> 2     b lower       filled  aqua    circle  2.6
#> 3     c lower       filled  aqua rectangle  9.4
#> 4     d lower       filled  aqua rectangle  2.7

get_edge_df(graph)
#>   from to        rel
#> 1    a  d leading_to
#> 2    b  c leading_to
#> 3    c  a leading_to

Viewing a Graph Object

With the render_graph() function, it's possible to view the graph object in the RStudio Viewer, or, output the DOT code for the current state of the graph.

If you'd like to return the Graphviz DOT code (to, perhaps, share it or use it directly with the Graphviz command-line utility), just use output = "DOT" in the render_graph() function. Here's a simple example:

###
# Create a simple graph
# and display it
###

library(DiagrammeR)

# Create a simple NDF
nodes <-
  create_nodes(
    nodes = 1:4,
    type = "number")

# Create a simple EDF
edges <-
  create_edges(
    from = c(1, 1, 3, 1),
    to = c(2, 3, 4, 4),
    rel = "related")

# Create the graph object,
# incorporating the NDF and
# the EDF, and, providing
# some global attributes
graph <-
  create_graph(
    nodes_df = nodes,
    edges_df = edges,
    graph_attrs = "layout = neato",
    node_attrs = "fontname = Helvetica",
    edge_attrs = "color = gray20")

# View the graph
render_graph(graph)

With packages such as magrittr or pipeR, one can conveniently pipe output from create_graph() to render_graph(). The magrittr package provides a forward pipe with the %>% operator. With pipeR, use %>>% instead.

###
# Use magrittr's %>% to create a graph and
# then view it without storing that graph object
###

library(DiagrammeR)
library(magrittr)

# Create a simple NDF
nodes <-
  create_nodes(
    nodes = 1:4,
    type = "number")

# Create a simple EDF
edges <-
  create_edges(
    from = c(1, 1, 3, 1),
    to = c(2, 3, 4, 4),
    rel = "related")

# Create the graph object,
# incorporating the NDF and
# the EDF, and, providing some
# global attributes
graph <-
  create_graph(
    nodes_df = nodes,
    edges_df = edges,
    graph_attrs = "layout = neato",
    node_attrs = "fontname = Helvetica",
    edge_attrs = "color = gray20")

# Use the %>% operator between
# `create_graph()` and `render_graph()`
create_graph(
  nodes_df = nodes,
  edges_df = edges,
  graph_attrs = "layout = neato",
  node_attrs = "fontname = Helvetica",
  edge_attrs = "color = gray20") %>%
  render_graph

If you'd like to return the Graphviz DOT code (to, perhaps, share it or use it directly with the Graphviz command-line utility), just use output = "DOT" in the render_graph() function. Here's a simple example:

###
# Use magrittr's %>% to create a graph and
# then output the DOT code for the graph
###

library(DiagrammeR)
library(magrittr)

# Create a simple NDF
nodes <-
  create_nodes(
    nodes = 1:4,
    type = "number")

# Create a simple EDF
edges <-
  create_edges(
    from = c(1, 1, 3, 1),
    to = c(2, 3, 4, 4),
    rel = "related")

# Create the graph object,
# incorporating the NDF and
# the EDF, and, providing
# some global attributes
graph <-
  create_graph(
    nodes_df = nodes,
    edges_df = edges,
    graph_attrs = "layout = neato",
    node_attrs = "fontname = Helvetica",
    edge_attrs = "color = gray20")

# Use the %>% operator between
# `create_graph()` and `render_graph()`
# (using the output = "DOT" option)
create_graph(
  nodes_df = nodes,
  edges_df = edges,
  graph_attrs = "layout = neato",
  node_attrs = "fontname = Helvetica",
  edge_attrs = "color = gray20") %>%
  render_graph(output = "DOT")
#> [1] "digraph {\n\ngraph [layout = neato]\n\nnode [fontname = Helvetica]\n\nedge [color = gray20]\n\n  '1' [label = '1'] \n  '2' [label = '2'] \n  '3' [label = '3'] \n  '4' [label = '4'] \n  '1'->'2' \n  '1'->'3' \n  '3'->'4' \n  '1'->'4' \n}"

# Use the R `cat()` function to
# direct the DOT code to a `.gv` file
# (DiagrammeR can open this file
# directly for viewing and editing)
create_graph(
  nodes_df = nodes,
  edges_df = edges,
  graph_attrs = "layout = neato",
  node_attrs = "fontname = Helvetica",
  edge_attrs = "color = gray20") %>%
  render_graph(output = "DOT") %>%
  cat(file = "~/dot.gv")

Creating a 'Random' Graph

Creating a random graph is actually quite useful. Seeing these graphs with specified numbers of nodes and edges will allow you to quickly get a sense of how connected graphs can be at different sizes.

The create_random_graph() function is provided with several options for creating random graphs. The best way to understand the use of the function is through several examples. In all these examples, the function will be wrapped in render_graph() (with output = "visNetwork") to quickly inspect the graph upon creation. (Alternatively, the magrittr package's %>% operator can pipe output from create_random_graph() directly to render_graph().)

We can create a not-so-random graph with 2 nodes and 1 edge (by default, the graphs produced are undirected graphs). The argument n is the number of nodes, and m is the number of edges.

###
# Create a very simple random graph
###

library(DiagrammeR)

# Create a simple, random graph
# and render with the `visNetwork`
# output option
render_graph(
  create_random_graph(n = 2, m = 1),
  output = "visNetwork")

It's better with more nodes and edges though. Try this again with 15 nodes and 30 edges:

###
# Create a random graph with 15 nodes, 30 edges
###

library(DiagrammeR)

# Create a random graph with
# 15 nodes and twice as many edges
# and then render the graph with
# the `visNetwork` output option
render_graph(
  create_random_graph(n = 15, m = 30),
  output = "visNetwork")

Notice that a maximum of one edge is created between a pair of nodes (i.e., no multiple edges created). What if you specify a number of edges (m) that exceeds the number in a fully-connected graph of size n? You get an error, however, it's an informative error (providing the maximum number of edges m for the given n) but it's an error nonetheless.

###
# Create a random, fully-connected graph of 15 nodes
###

library(DiagrammeR)

# Attempt to generate a random
# graph with 15 nodes and 200 edges
# (more than the number of edges in
# a fully-connected graph with
# single edges between nodes)
render_graph(
  create_random_graph(n = 15, m = 200),
  output = "visNetwork")

# --------------------------------------------------------
# Error in create_random_graph(n = 15, m = 200) :
#   The number of edges exceeds the maximum possible (105)
# --------------------------------------------------------

# Use `n = 15` and `m = 105` to
# yield a fully-connected graph
# with 15 nodes
render_graph(
  create_random_graph(n = 15, m = 105),
  output = "visNetwork")

Going the opposite way, you don't need to have edges. Simply specify m = 0 for any number of nodes n:

###
# Create a random graph with
# many nodes but with no edges
###

library(DiagrammeR)

# Create a random graph with
# 512 nodes but no edges
render_graph(
  create_random_graph(n = 512, m = 0),
  output = "visNetwork")

Setting a seed is a great way to create something random yet reproduce that random something (there are many reasons to do this; creating examples is one use). This can be done with the create_random_graph() function by specifying a seed number with the argument set_seed. Here's an example:

###
# Create a reproducible, random graph
###

library(DiagrammeR)

# Create a random graph with
# a seed set so that the same graph
# will be generated every time
render_graph(
  create_random_graph(n = 5, m = 4,
                      set_seed = 30),
  output = "visNetwork")

Upon repeat runs, the connections in the graph will be the same each and every time (3 is a free node, 1 is connected to 2 and 5, etc.).

By default, the random graphs generated are undirected graphs. To produce directed graphs, simply include directed = TRUE in the create_random_graph() statement.

###
# Create a random, directed graph
###

library(DiagrammeR)

# Create a random graph but with
# directed edges by setting
# `directed = TRUE`
render_graph(
  create_random_graph(
    n = 15, m = 22, directed = TRUE),
  output = "visNetwork")

Combining Graphs

With the combine_graphs() function, one can combine two graphs in order to make a new graph, merging nodes and edges in the process. The use of an optional edge data frame (EDF) allows for new edges to be formed across the combined graphs.

While you would provide two graphs (for arguments x and y), the order here is important. The graph provided as x is considered the graph object to which another graph will be joined. This graph should be considered the host graph as the resulting graph will retain only the attributes of this graph. The graph provided as y is thus the graph object that is to be joined with the graph suppled as x.

###
# Create two graphs and combine them into one
###

library(DiagrammeR)

# Create the first graph
nodes_1 <-
  create_nodes(nodes = 1:10)

edges_1 <-
  create_edges(
    from = 1:9,
    to = 2:10)

graph_1 <-
  create_graph(
    nodes_df = nodes_1,
    edges_df = edges_1,
    graph_attrs = "rankdir = LR")

# Create the second graph (note that node ID values
# are different from those of the first graph)
nodes_2 <-
  create_nodes(nodes = 11:20)

edges_2 <-
  create_edges(
    from = 11:19,
    to = 12:20)

graph_2 <-
  create_graph(
    nodes_df = nodes_2,
    edges_df = edges_2,
    graph_attrs = "rankdir = TD")

# Combine the two graphs, the
# global graph attribute
# `graph_attrs = "rankdir = LR"`
# will be retained since it is
# part of the graph supplied as `x`
combined_graph <-
  combine_graphs(x = graph_1, y = graph_2)

# Display the combined graph
render_graph(combined_graph)

Joining two graphs by simply supplying them as x and y will not by itself create connections between the two graphs. To conveniently create connections between the joined graphs, one can supply an EDF to the edges_df argument. Otherwise, connections could always to be made in subsequent function calls using add_edge(), add_edge_df(), or add_edges_from_table().

###
# Create two graphs and combine them
# with new edges created
###

library(DiagrammeR)

# Create the first graph
nodes_1 <-
  create_nodes(nodes = 1:10)

edges_1 <-
  create_edges(
    from = 1:9,
    to = 2:10)

graph_1 <-
  create_graph(
    nodes_df = nodes_1,
    edges_df = edges_1,
    graph_attrs = "rankdir = LR")

# Create the second graph
nodes_2 <-
  create_nodes(nodes = 11:20)

edges_2 <-
  create_edges(
    from = 11:19,
    to = 12:20)

graph_2 <-
  create_graph(
    nodes_df = nodes_2,
    edges_df = edges_2)

# Create an auxiliary EDF for
# creating edges across the two
# graphs supplied as `x` and `y`
# to `combine_graphs()`
extra_edges <-
  create_edges(
    from = c(5, 19, 1),
    to = c(12, 3, 11))

# Combine the two graphs, adding
# the `extra_edges` EDF to the
# `edges_df` argument
combined_graph <-
  combine_graphs(
    x = graph_1,
    y = graph_2,
    edges_df = extra_edges)

# Display the combined graph
render_graph(combined_graph)

There are likely quite a few uses for the combine_graphs() function, such as combining subgraphs (with the create_subgraph_from_selection() function), combining graphs made from data collected at different times, etc.

Importing a Graph from a File

The import_graph() function is all about loading in graphs from files. There are numerous graph file formats and while DiagrammeR does not support too many of them at this point, those formats that are supported are amongst the most well used. (Support for other file formats is forthcoming.) Thus far, import_graph() will import GraphML (.graphml), GML (.gml), and SIF (.sif) graph files. As with the create_graph() function, import_graph() allows for provision of a name and a date-time for the newly imported graph.

For the graph_file argument, a path to a graph file (one that can be opened) is expected. For file_type, you can explicitly specify the type of file to be imported. The options are: graphml (GraphML), gml (GML), and sif (SIF). If not supplied, the function will infer the type by the file extension of the file pointed toward. The graph_name, graph_time, and graph_tz can be optionally supplied. These behave in exactly the same way as in the create_graph() function.

For testing purposes, example files of each type are included in the DiagrammeR package. The GraphML file power_grid.graphml contains an undirected graph of a power station network across several northwestern U.S. states. It contains 4941 nodes and 6594 edges. Do you like karate? If so, you'll be happy to know that the GML example (karate.gml) is representative of friendships amongst members of a karate club, circa early 1970s. It is a relatively small graph with 34 nodes and 78 edges. The human interactome SIF file Human_Interactome.sif is a fairly large graph with 8347 nodes and 61,263 edges.

###
# Import graphs from various types of graph file formats
###

library(DiagrammeR)

# Open a graph from a GraphML file
graphml_graph <-
  import_graph(
    graph_file =
      system.file("examples/power_grid.graphml",
                  package = "DiagrammeR"),
    graph_name = "power_grid")

# Open a graph from a GML file
gml_graph <-
  import_graph(
    graph_file =
      system.file("examples/karate.gml",
                  package = "DiagrammeR"),
    graph_name = "karate")

# Open a graph from a SIF file
sif_graph <-
  import_graph(
    graph_file =
      system.file("examples/Human_Interactome.sif",
                  package = "DiagrammeR"),
    graph_name = "Human_Interactome")

Once you've imported a graph, you're free to use it as any graph you might make yourself with create_graph().