Dot plots with lattice

Lukas Soenning


This is a short tutorial on the construction of dot plots with the lattice package (Sarkar 2014) in R. It is an online appendix to the paper The dot plot: a fine tool for data viualization presented at AVML 2014 in Tübingen. Elements such as Queuing and Offsetting are not implemented in the package, so the code for these design options is a bit more cumbersome.


Contents

  1. Simple dot plots
    1.1. dotplot()
    1.2. Order
    1.3. Queuing
    1.4. Adding error bars

  2. Comparing groups
    2.1. Juxtaposition
    2.2. Adding error bars to juxtaposed panels
    2.3. Superposition
    2.4. Superposition with queuing
    2.5. Combining superposition and error bars: Offsetting

  3. Multipanel conditioning
    3.1. Juxtaposition only
    3.2. Combination of juxtaposition ans superposition
    3.3. Adding error bars: A combination of juxtaposition, superposition and offsetting


Preparations

Install the lattice package.

install.packages("lattice")
## Installing package(s) into 'C:/Users/ba4rh5/Documents/R/win-library/2.15'
## (as 'lib' is unspecified)
## Error: trying to use CRAN without setting a mirror

Load it into R.

library(lattice)
## Warning: package 'lattice' was built under R version 2.15.3

The following code changes the settings to produce black and white output (and a transparent background for the strips of the panels).

bw.theme <- canonical.theme(color = FALSE)
bw.theme$strip.background$col <- "transparent"
lattice.options(default.theme = bw.theme)


1. Simple dot plots



Throughout this tutorial, we will be using a hypothetical, self-created data frame for illustration. We will extend this data frame in later sections. We will start with data.1, which contains the means for 5 levels of a categorical variable.

data.1 <- data.frame(Category = c("Category M", "Category C", "Category B", 
    "Category G", "Category L"), Mean = c(92, 73, 67, 46, 27))
data.1
##     Category Mean
## 1 Category M   92
## 2 Category C   73
## 3 Category B   67
## 4 Category G   46
## 5 Category L   27

1.1. dotplot()

Simple dot plots can be produced with the function dotplot(). In the formula (which is the first argument of lattice plotting functions), the label of the points always precedes the tilde (~), the numeric value associated with the label (here: Mean) follows the tilde. The second argument data= names the data frame that contains the labels and numbers. The basic structure is: dotplot(labels ~ values, data=…)

dotplot(Category ~ Mean, data = data.1)

plot of chunk unnamed-chunk-5


1.2. Order

Order is important in dotplots. In this first version of the plot, the categories are listed in alphabetical order from bottom to top (default in R). The function reorder() is an important one for plotting dot plots: we can use it to change the order of the dots. It has two arguments: the first is the vector to be reordered, the second contains the values for the ordering (must be the same length). In the following call, the Categories are reordered according to their Mean before plotting. The default measure R applies for ordering the labels in the reorder() function is the arithmetic mean (if you want to use another function for ordering the data, e.g. the median, supply the name of the function as the third argument to the reorder() fucntion).

dotplot(reorder(Category, Mean) ~ Mean, data = data.1)

plot of chunk unnamed-chunk-6


1.3. Queuing

To illustrate the use of simple queuing, we will add a panel that shows the standard deviation within the groups as a measure of spread. We will first add the values to the data frame.

data.1 <- data.frame(Category = c("Category M", "Category C", "Category B", 
    "Category G", "Category L"), Mean = c(92, 73, 67, 46, 27), Sd = c(10, 9, 
    7, 6, 3))
data.1
##     Category Mean Sd
## 1 Category M   92 10
## 2 Category C   73  9
## 3 Category B   67  7
## 4 Category G   46  6
## 5 Category L   27  3

Now we create two plots which we assign to two objects (plot.1 and plot.2). In the second panel, which shows the standard deviation we don't need the category labels on the y-axis and suppress them with the scales= argument. The function print() prints the panels on the page. The argument position= is used to arrange the two panels. You can try out different versions. When printing the first plot we add the argument more=TRUE so that R will add the second plot to the same page.

plot.1 <- dotplot(reorder(Category, Mean) ~ Mean, data = data.1)
plot.2 <- dotplot(reorder(Category, Mean) ~ Sd, data = data.1, scales = list(y = list(draw = FALSE)), 
    xlab = "Standard deviation")
print(plot.1, position = c(0, 0, 0.62, 1), more = TRUE)
print(plot.2, position = c(0.56, 0, 1, 1))

plot of chunk unnamed-chunk-8

Here is the same plot with a different arrangement:

plot.1 <- dotplot(reorder(Category, Mean) ~ Mean, data = data.1)
plot.2 <- dotplot(reorder(Category, Mean) ~ Sd, data = data.1, scales = list(y = list(draw = FALSE)))
print(plot.1, position = c(0, 0, 0.72, 1), more = TRUE)
print(plot.2, position = c(0.66, 0, 1, 1))

plot of chunk unnamed-chunk-9


1.5. Adding error bars

We first add confidence interval limits to the data frame.

data.1
##     Category Mean Sd
## 1 Category M   92 10
## 2 Category C   73  9
## 3 Category B   67  7
## 4 Category G   46  6
## 5 Category L   27  3
data.1$Upper <- data.1$Mean + 5
data.1$Lower <- data.1$Mean - 5
data.1
##     Category Mean Sd Upper Lower
## 1 Category M   92 10    97    87
## 2 Category C   73  9    78    68
## 3 Category B   67  7    72    62
## 4 Category G   46  6    51    41
## 5 Category L   27  3    32    22

Adding error bars is a bit more difficult. We have to use the panel function to do this. If the panel function is used in a lattice plotting function, it controls everything that is drawn into the panel(s). We will build the plot step by step. First we will add the points using the panel function panel.xyplot.

dotplot(reorder(Category, Mean) ~ Mean, data = data.1, panel = function(x, y) {
    panel.xyplot(x, y, pch = 16)
})

plot of chunk unnamed-chunk-11

You can change the size of the dots with the argument cex= in the panel.xyplot function (default: 1). We add the vertical lines using the panel function panel.abline.The function panel.abline precedes panel.xyplot because otherwise the grey lines are drawn over the dots (try it out!).

dotplot(reorder(Category, Mean) ~ Mean, data = data.1, panel = function(x, y) {
    panel.abline(h = unique(y), col = "#E6E6E6")
    panel.xyplot(x, y, pch = 16)
})

plot of chunk unnamed-chunk-12

Finally, we add the error bars with the panel function panel.arrows. The upper and lower limits are in the columns Upper and Lower. The arguments x0, x1, y0, y1 specify where the bars should be drawn. The effect of the other arguments: length= controls the length of the vertical segments at the end of the error bars; angle= specifies the orientation (degrees) of the ends of the error bars; code= controls the type of “arrow” drawn (arrows on one or both ends); lend=2 draws lines with sharp edges.

dotplot(reorder(Category, Mean) ~ Mean, data = data.1, panel = function(x, y) {
    panel.abline(h = unique(y), col = "#E6E6E6")
    panel.xyplot(x, y, pch = 16)
    panel.arrows(x0 = data.1$Lower, x1 = data.1$Upper, y0 = as.numeric(y), y1 = as.numeric(y), 
        length = 0.02, angle = 90, code = 3, lend = 2)
})

plot of chunk unnamed-chunk-13

Some bars disappear. We need to change the range of the x-axis (with the argument xlim=) so that the upper and lower ends of all error bars are visible.

dotplot(reorder(Category, Mean) ~ Mean, data = data.1, xlim = c(20, 100), panel = function(x, 
    y) {
    panel.abline(h = unique(y), col = "#E6E6E6")
    panel.xyplot(x, y, pch = 16)
    panel.arrows(x0 = data.1$Lower, x1 = data.1$Upper, y0 = as.numeric(y), y1 = as.numeric(y), 
        length = 0.02, angle = 90, code = 3, lend = 2)
})

plot of chunk unnamed-chunk-14



2. Comparing groups



We will extend the dataset to include the same measurements for a second group.

data.2 <- data.frame(Category = rep(c("Category M", "Category C", "Category B", 
    "Category G", "Category L"), 2), Group = c(rep("Group 1", 5), rep("Group 2", 
    5)), Mean = c(92, 73, 67, 46, 27, 92.5, 75, 70, 56, 46))
data.2
##      Category   Group Mean
## 1  Category M Group 1 92.0
## 2  Category C Group 1 73.0
## 3  Category B Group 1 67.0
## 4  Category G Group 1 46.0
## 5  Category L Group 1 27.0
## 6  Category M Group 2 92.5
## 7  Category C Group 2 75.0
## 8  Category B Group 2 70.0
## 9  Category G Group 2 56.0
## 10 Category L Group 2 46.0

If we want to compare the two groups visually (Group 1 and Group 2), there are two possibilities: juxtaposition and superposition.


2.1. Juxtaposition

Juxtaposing the groups means plotting them into separate panels. To do this, we need to use the variable Group as a conditioning variable in the formula. A conditioning variable is attached to the formula with a | sign.

dotplot(reorder(Category, Mean) ~ Mean | Group, data = data.2)

plot of chunk unnamed-chunk-16

The argument between= inserts a blank space between the panels. I think this looks better.

dotplot(reorder(Category, Mean) ~ Mean | Group, data = data.2, between = list(x = 0.5))

plot of chunk unnamed-chunk-17


2.2. Adding error bars to juxtaposed panels

We will extend the dataset to include upper and lower confidence interval limits for every measurement.

data.2
##      Category   Group Mean
## 1  Category M Group 1 92.0
## 2  Category C Group 1 73.0
## 3  Category B Group 1 67.0
## 4  Category G Group 1 46.0
## 5  Category L Group 1 27.0
## 6  Category M Group 2 92.5
## 7  Category C Group 2 75.0
## 8  Category B Group 2 70.0
## 9  Category G Group 2 56.0
## 10 Category L Group 2 46.0
data.2$Upper <- data.2$Mean + 5
data.2$Lower <- data.2$Mean - 5
data.2
##      Category   Group Mean Upper Lower
## 1  Category M Group 1 92.0  97.0  87.0
## 2  Category C Group 1 73.0  78.0  68.0
## 3  Category B Group 1 67.0  72.0  62.0
## 4  Category G Group 1 46.0  51.0  41.0
## 5  Category L Group 1 27.0  32.0  22.0
## 6  Category M Group 2 92.5  97.5  87.5
## 7  Category C Group 2 75.0  80.0  70.0
## 8  Category B Group 2 70.0  75.0  65.0
## 9  Category G Group 2 56.0  61.0  51.0
## 10 Category L Group 2 46.0  51.0  41.0

If we want to add error bars to multipanel displays, we need to make use of the subscripts argument. It contains information on which rows of the data frames (i.e. which subgroups of the data) are assigned to a given panel. It thus tells the function panel.arrows where in the data frame it can find the upper and lower confidence interval limits for a given panel.

dotplot(reorder(Category, Mean) ~ Mean | Group, data = data.2, between = list(x = 0.5), 
    xlim = c(20, 100), panel = function(x, y, subscripts, ...) {
        panel.abline(h = unique(y), col = "#E6E6E6")
        panel.xyplot(x, y, pch = 16)
        panel.arrows(x0 = data.2$Lower[subscripts], x1 = data.2$Upper[subscripts], 
            y0 = as.numeric(y), y1 = as.numeric(y), length = 0.02, angle = 90, 
            code = 3, lend = 2)
    })

plot of chunk unnamed-chunk-19


2.3. Superposition

The second option is to superpose groups into the same panel. This is done with the argument group=. We can add a legend with the argument auto.key=TRUE.

dotplot(reorder(Category, Mean) ~ Mean, data = data.2, groups = Group, auto.key = TRUE)

plot of chunk unnamed-chunk-20


2.4. Superposition with queuing

Queuing can be used to to add further information on the differences between the groups. We will add a panel showing the difference between the group means. Again, we create two objects and print them with print().

plot.1 <- dotplot(reorder(Category, Mean) ~ Mean, data = data.2, groups = Group)
plot.2 <- dotplot(reorder(Category, Mean) ~ abs(data.2[data.2$Group == "Group 1", 
    ]$Mean - data.2[data.2$Group == "Group 2", ]$Mean), data = data.2, xlab = "Mean difference", 
    scales = list(y = list(draw = FALSE)))
print(plot.1, position = c(0, 0, 0.72, 1), more = TRUE)
print(plot.2, position = c(0.66, 0, 1, 1))

plot of chunk unnamed-chunk-21

Adding 95% confidence intervals to the panel showing the mean differences will make the graph much more informative. In order to do this we will create a new data frame containing the upper and lower limits of the 95% confidence interval.

mean.diff <- data.2[data.2$Group == "Group 1", ]$Mean - data.2[data.2$Group == 
    "Group 2", ]$Mean
mean.diff.upper <- mean.diff + 4
mean.diff.lower <- mean.diff - 4
mean.difference <- data.frame(Category = c("Category M", "Category C", "Category B", 
    "Category G", "Category L"), mean.diff, mean.diff.upper, mean.diff.lower)
mean.difference
##     Category mean.diff mean.diff.upper mean.diff.lower
## 1 Category M      -0.5             3.5            -4.5
## 2 Category C      -2.0             2.0            -6.0
## 3 Category B      -3.0             1.0            -7.0
## 4 Category G     -10.0            -6.0           -14.0
## 5 Category L     -19.0           -15.0           -23.0

We create two objects again and plot them into the same frame with the print() function.

plot.1 <- dotplot(reorder(Category, Mean) ~ Mean, data = data.2, groups = Group)
plot.2 <- dotplot(reorder(Category, Mean) ~ -mean.difference$mean.diff, data = data.2, 
    xlab = "Mean difference (95% CI)", xlim = c(-6, 26), scales = list(y = list(draw = FALSE)), 
    panel = function(x, y) {
        panel.abline(h = unique(y), col = "#E6E6E6")
        panel.abline(v = 0, col = "grey")
        panel.xyplot(x, y, pch = 16)
        panel.arrows(x0 = -mean.difference$mean.diff.upper, x1 = -mean.difference$mean.diff.lower, 
            y0 = as.numeric(y), y1 = as.numeric(y), length = 0.02, angle = 90, 
            code = 3, lend = 2)
    })
print(plot.1, position = c(0, 0, 0.72, 1), more = TRUE)
print(plot.2, position = c(0.66, 0, 1, 1))

plot of chunk unnamed-chunk-23


2.5. Combining superposition and error bars: Offsetting

The following code is a bit more cumbersome. We need to run the panel.arrows and panel.xyplot twice for the two subsets of data, since we have to specify different y-axis coordinates for the two groups (= offsetting). Inside the panel function, two subsets are created (set1 and set2). These are then used in the two different panel.arrows and panel.xyplot functions. The amount of offsetting you need depends on the size of your graph. Here I chose .15 (the distance between the light horizontal lines is 1). In my opinion it looks good if the plotting symbols touch the light grey line.

dotplot(reorder(Category, Mean) ~ Mean, data = data.2, groups = Group, xlim = c(20, 
    100), panel = function(x, y) {
    set1 <- data.2[data.2$Group == "Group 1", ]
    set2 <- data.2[data.2$Group == "Group 2", ]
    panel.abline(h = unique(y), col = "lightgrey")
    panel.arrows(x0 = set1$Upper, x1 = set1$Lower, y0 = as.numeric(y) + 0.15, 
        y1 = as.numeric(y) + 0.15, length = 0.02, angle = 90, code = 3, lend = 2)
    panel.arrows(x0 = set2$Upper, x1 = set2$Lower, y0 = as.numeric(y) - 0.15, 
        y1 = as.numeric(y) - 0.15, length = 0.02, angle = 90, code = 3, lend = 2)
    panel.xyplot(set1$Mean, as.numeric(y) + 0.15, pch = 16)
    panel.xyplot(set2$Mean, as.numeric(y) - 0.15, pch = 21, fill = "white")
}, key = list(text = list(levels(data.2$Group)), points = list(pch = c(16, 21))))

plot of chunk unnamed-chunk-24



3. Multipanel conditioning



We will extend the dataset to include a second condition in which all measurements were taken (e.g. Pretest and posttest). Our design now includes 4 variables: Outcome x Category x Group x Condition.

data.3 <- data.frame(Category = rep(c("Category M", "Category C", "Category B", 
    "Category G", "Category L"), 4), Group = c(rep(c(rep("Group 1", 5), rep("Group 2", 
    5)), 2)), Condition = c(rep("Pretest", 10), rep("Posttest", 10)), Mean = c(92, 
    73, 67, 46, 27, 92.5, 75, 70, 56, 46, 82, 64, 61, 43, 26, 93.5, 76, 73, 
    63, 58))
data.3$Upper <- data.3$Mean + 5
data.3$Lower <- data.3$Mean - 5
data.3
##      Category   Group Condition Mean Upper Lower
## 1  Category M Group 1   Pretest 92.0  97.0  87.0
## 2  Category C Group 1   Pretest 73.0  78.0  68.0
## 3  Category B Group 1   Pretest 67.0  72.0  62.0
## 4  Category G Group 1   Pretest 46.0  51.0  41.0
## 5  Category L Group 1   Pretest 27.0  32.0  22.0
## 6  Category M Group 2   Pretest 92.5  97.5  87.5
## 7  Category C Group 2   Pretest 75.0  80.0  70.0
## 8  Category B Group 2   Pretest 70.0  75.0  65.0
## 9  Category G Group 2   Pretest 56.0  61.0  51.0
## 10 Category L Group 2   Pretest 46.0  51.0  41.0
## 11 Category M Group 1  Posttest 82.0  87.0  77.0
## 12 Category C Group 1  Posttest 64.0  69.0  59.0
## 13 Category B Group 1  Posttest 61.0  66.0  56.0
## 14 Category G Group 1  Posttest 43.0  48.0  38.0
## 15 Category L Group 1  Posttest 26.0  31.0  21.0
## 16 Category M Group 2  Posttest 93.5  98.5  88.5
## 17 Category C Group 2  Posttest 76.0  81.0  71.0
## 18 Category B Group 2  Posttest 73.0  78.0  68.0
## 19 Category G Group 2  Posttest 63.0  68.0  58.0
## 20 Category L Group 2  Posttest 58.0  63.0  53.0

3.1. Juxtaposition only

We first try a display with juxtaposition only. The two conditioning variables are given in the formula after the | sign:

dotplot(Category ~ Mean | Condition * Group, data = data.3, between = list(x = 0.5, 
    y = 0.5))

plot of chunk unnamed-chunk-26


3.2. Combination of juxtaposition and superposition

With the limited number of categories, we can use a combination of juxtaposition and superposition. In this first version, Condition will be the conditioning variable and Group will be the grouping variable. The argument auto.key= draws a legend at the top.

dotplot(reorder(Category, Mean) ~ Mean | Condition, data = data.3, auto.key = TRUE, 
    groups = Group, between = list(x = 0.5, y = 0.5))

plot of chunk unnamed-chunk-27

In this next plot, Group is the conditioning variable and Condition is the grouping variable.

dotplot(reorder(Category, Mean) ~ Mean | Group, data = data.3, auto.key = TRUE, 
    groups = Condition, between = list(x = 0.5, y = 0.5))

plot of chunk unnamed-chunk-28


3.2. Adding error bars: A combination of juxtaposition, superposition, and offsetting

The following code adds error bars to multiple panels with superposed groups. We again need the subscripts argument and we need to use the panel.arrows and panel.xyplot functions twice with different subsets of the data (set1 and set2).

dotplot(reorder(Category, Mean) ~ Mean | Group, data = data.3, groups = Condition, 
    between = list(x = 0.5), xlim = c(20, 100), panel = function(x, y, subscripts, 
        ...) {
        set1 <- data.3[data.3$Condition == "Pretest", ][subscripts, ]
        set2 <- data.3[data.3$Condition == "Posttest", ][subscripts, ]
        panel.abline(h = unique(y), col = "lightgrey")
        panel.arrows(x0 = set1$Upper, x1 = set1$Lower, y0 = as.numeric(y) + 
            0.15, y1 = as.numeric(y) + 0.15, length = 0.02, angle = 90, code = 3, 
            lend = 2)
        panel.arrows(x0 = set2$Upper, x1 = set2$Lower, y0 = as.numeric(y) - 
            0.15, y1 = as.numeric(y) - 0.15, length = 0.02, angle = 90, code = 3, 
            lend = 2)
        panel.xyplot(set1$Mean, as.numeric(y) + 0.15, pch = 16)
        panel.xyplot(set2$Mean, as.numeric(y) - 0.15, pch = 21, fill = "white")
    }, key = list(text = list(levels(data.3$Condition)), points = list(pch = c(16, 
        21))))

plot of chunk unnamed-chunk-29

Further reading

A brief overview of the lattice package is given by Murrell (2011), Chapter 4. This chapter is available from the authorÂ’s homepage for free:

https://www.stat.auckland.ac.nz/~paul/RGraphics/chapter4.pdf

Much more information on how to use lattice for plotting can be found in the following book, written by the author of the lattice package:

Sarkar, Deepayan. 2008. Lattice: Multivariate data visualization with R. New York: Springer.