Graphviz
Graphviz support is an integral part of the DiagrammeR package. Graphviz consists of a graph description language called the DOT language and it also comprises various tools that can process the DOT language. DOT is highly customizable and it allows you to control line colors, arrow shapes, node shapes, and many other layout features.
DiagrammeR Implementation
For Graphviz graphs, DiagrammeR
uses the processing function called grViz()
. What you pass
into grViz()
is a valid graph specification in the
DOT language. The DOT graph
description can either be delivered to grViz()
in the form
of a string, a reference to a Graphviz file (with a
.gv
file extension), or as a text connection.
All of the code examples provided in later sections call the
grViz()
function in an R script and pass in a graph
description as a string. It is important to consider that strings in R
cannot contain any unescaped double-quote characters. However, the
grViz()
function allows for single-quote characters in
their place. As a further convenience, when the DOT
graph description is supplied as a file (e.g.,
dot-graph.gv
) or as a text connection, either format for
quotes will be accepted.
In very recent builds of RStudio, the use of an
external text file with the .gv
file extension can provide
the advantage of syntax coloring and previewing in the
RStudio Viewer pane after saving (if
'Preview on Save'
is selected), or, by pressing the
'Preview'
button on the Source pane.
Defining a Graphviz Graph
The Graphviz graph specification must begin with a directive stating whether a directed graph (digraph) or an undirected graph (graph) is desired. Semantically, this indicates whether or not there is a natural direction from one of the edge’s nodes to the other. An optional graph ID follows this and paired curly braces denotes the body of the statement list (stmt_list).
Optionally, a graph may also be described as strict. This forbids the creation of multi-edges (i.e., there can be at most one edge with a given tail node and head node in the directed case). For undirected graphs, there can be at most one edge connected to the same two nodes. Subsequent edge statements using the same two nodes will identify the edge with the previously defined one and apply any attributes given in the edge statement.
Here is the basic structure:
[strict] (graph | digraph) [ID] '{' stmt_list '}'
Statements
The graph statement (graph_stmt
), the node statement
(node_stmt
), and the edge statement
(edge_stmt
) are the three most commonly used statements in
the Graphviz DOT language. Graph
statements allow for attributes to be set for all components of the
graph. Node statements define and provide attributes for graph nodes.
Edge statements specify the edge operations between nodes and they
supply attributes to the edges. For the edge operations, a directed
graph must specify an edge using the edge operator ->
while an undirected graph must use the --
operator.
Within these statements follow statement lists. Thus for a node
statement, a list of nodes is expected. For an edge statement, a list of
edge operations. Any of the list items can optionally have an attribute
list (attr_list
) which modify the attributes of either the
node or edge.
Comments may be placed within the statement list. These can be marked
using a //
or a /* */
structure. Comment lines
are denoted by a #
character. Multiple statements within a
statement list can be separated by linebreaks or ;
characters between multiple statements
Here is an example where nodes (in this case styled as boxes and circles) can be easily defined along with their connections:
grViz("
digraph boxes_and_circles {
# a 'graph' statement
graph [overlap = true, fontsize = 10]
# several 'node' statements
node [shape = box,
fontname = Helvetica]
A; B; C; D; E; F
node [shape = circle,
fixedsize = true,
width = 0.9] // sets as circles
1; 2; 3; 4; 5; 6; 7; 8
# several 'edge' statements
A->1 B->2 B->3 B->4 C->A
1->D E->A 2->4 1->5 1->F
E->6 4->6 5->7 6->7 3->8
}
")
Subgraphs and Clusters
Subgraphs play three roles in Graphviz. First, a subgraph can be used to represent graph structure, indicating that certain nodes and edges should be grouped together. This is the usual role for subgraphs and typically specifies semantic information about the graph components. It can also provide a convenient shorthand for edges. An edge statement allows a subgraph on both the left and right sides of the edge operator. When this occurs, an edge is created from every node on the left to every node on the right. For example, the specification
A -> {B C}
is equivalent to
A -> B
A -> C
In the second role, a subgraph can provide a context for setting attributes. For example, a subgraph could specify that blue is the default color for all nodes defined in it. In the context of graph drawing, a more interesting example is
subgraph {
rank = same; A; B; C;
}
This anonymous subgraph specifies that the nodes A
,
B
, and C
should all be placed on the same
rank.
The third role for subgraphs directly involves how the graph will be
laid out by certain layout types. If the name of the subgraph begins
with cluster
, Graphviz notes the subgraph
as a special cluster subgraph. If supported, the layout will make it
such that the nodes belonging to the cluster are drawn together, with
the entire drawing of the cluster contained within a bounding
rectangle.
Graphviz Attributes
Graphviz attributes allow you to style your Graphviz graph. Combinations of attributes for nodes, edges, clusters, and for the entire graph provide for highly-customized layouts.
Node Attributes
All Graphviz attributes are specified by name-value pairs. Thus, to set the fillcolor of a node abc, one would use
abc [fillcolor = red]
There are lots of node attributes. The following provides a visual guide to the more notable ones.
Edge Attributes
Edge attributes are set the same way as node attributes. For example,
to set the arrowhead style of an edge abc -> def
, one
would use
abc -> def [arrowhead = diamond]
Quotation marks are important only for multiword attributes, such
might be used in the label
attribute.
Refer to the following for some of the more important edge attributes
Colors
By default, Graphviz can use colors provided as
hexadecimal values, or, as X11 color names. The
following provides the entire list of X11 color names.
Some colors have additional 4-color palettes based on the named color.
Those additional colors can be used by appending the digits
1
-4
to the color name. Gray (or grey) has
variations from 0
-100
. Please note that, in
all color names, gray
is interchangeable with
grey
.
Graphviz Substitution
Graphviz substitution allows for mixing in R expressions into a Graphviz DOT graph specification without sacrificing readability.
How It Works
To take advantage of substitution when rendering a graph, simply use
the grViz()
function with Graphviz
DOT code:
grViz("
[...graph spec with substitutions using @@ notation and footnote R expressions...]
")
The notation @@
within the GraphViz
DOT graph specification will indicate where the
substitution is to take place. Corresponding R expressions (below the
formal graph specification, styled as footnotes) will provide values to
be substituted. The grViz()
function will autodetect
whether the working Graphviz DOT graph
specification contains @@
s between the curly braces of the
[graph|digraph] {...}
DOT
construction.
If you specify a substitution with @@
, you must ensure
there is a valid expression for that substitution. The expressions are
placed as footnotes and their evaluations must result in an R vector
object (i.e., not a data frame, list, or matrix). Because there is the
possibility to have multiple substitutions, numbering is required. Thus,
the @@
notation is immediately followed by a number and
that number should correspond to the number of the footnoted R
expression.
The substitution construction is:
"@@" + [footnote number]
Importantly, the footnote expressions should reside below the closing curly brace of the graph or digraph expression. It should always take the form of:
"[" + [footnote number] + "]:"
In the simple example of specifying a single node, the following substitution syntax would be used:
digraph {
@@1
}
[1]: 'a'
In the above example, the [1]:
footnote expression
evaluates as 'a'
, and, that is what’s substituted in place
of the @@1
(the resultant a
, in turn, will be
taken as the node ID).
Substitutions can also be used to insert values from vector indices into the graph specification. Simply use this format:
"@@" + [footnote number] + "-" + [index number]
Here’s an example of how specific values from vectors can be inserted into the graph:
digraph alpha {
@@1-1; @@1-2; @@1-3; @@1-4; @@1-5
@@1-6; @@1-7; @@1-8; @@1-9; @@1-10
}
[1]: LETTERS
After evaluation of the footnote expressions and substitution, the graph specification becomes this:
digraph alpha {
A; B; C; D; E
F; G; H; I; J
}
Examples
The best way to demonstrate how substitution works is through a set
of examples. Here is an example of substituting alphabet letters from
R’s LETTERS
constant into a Graphviz graph
specification.
You can mix substitutions of single-values objects and those specifying indices of R vector objects. As an example:
grViz("
digraph a_nice_graph {
# node definitions with substituted label text
node [fontname = Helvetica]
a [label = '@@1']
b [label = '@@2-1']
c [label = '@@2-2']
d [label = '@@2-3']
e [label = '@@2-4']
f [label = '@@2-5']
g [label = '@@2-6']
h [label = '@@2-7']
i [label = '@@2-8']
j [label = '@@2-9']
# edge definitions with the node IDs
a -> {b c d e f g h i j}
}
[1]: 'top'
[2]: 10:20
")
As can be seen from the output:
- the node with ID
a
is given the label “top” (after substituting@@1
with expression after the[1]:
footnote R expression) - the nodes with ID values from
b
toj
are respectively provided values from indices1
to9
(using the hyphenated form of@@
) of the evaluated expression10:20
(in the[2]:
footnote expression)
Footnote expressions are meant to be flexible. They can span multiple lines, and they can also take in objects that are available in the global workspace. So, as long as a vector object results from evaluation, substitution can be performed.