-
Notifications
You must be signed in to change notification settings - Fork 3
/
simplevis-reanalysis.Rmd
164 lines (120 loc) · 5.43 KB
/
simplevis-reanalysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
title: "Reanalysis of BELIV Paper Study Data"
author: "Robert Kosara"
date: "July 10, 2016"
output: html_document
---
For [a paper at BELIV 2010](http://kosara.net/publications/Kosara_BELIV_2010.html), Caroline Ziemkiewicz and I ran a study on Mechanical Turk that was among the first visualization studies run there. We decided to do a quick study of different chart types to see how well it would work.
This is a reanalysis of the data in light of [the more recent pie chart studies](https://eagereyes.org/blog/2016/an-illustrated-tour-of-the-pie-chart-study-results) by Drew Skau and me.
## Data
First, reading in the data and doing a bit of work to calculate error and reorder and rename the charts. We called the study _simplevis_ back then, so _sv_ is the prefix for the datasets.
```{r message=FALSE}
library(dplyr)
sv <- read.csv("data/simplevis.csv", sep="\t") %>%
mutate(error = Estimate-Value,
Type = factor(Type,
levels=c("squarepie", "bars", "piechart", "donut"),
labels=c("Square Pie", "Bar Chart", "Pie Chart", "Donut Chart")),
Confidence = factor(Confidence,
levels=c("low", "medium", "high"),
labels=c("Low", "Medium", "High")))
```
Aggreagate by participant and chart type. This calculates mean error and mean absolute error.
```{r}
svAgg <- group_by(sv, ID, Type) %>%
summarize(meanError = mean(error),
meanAbsError = mean(abs(error)))
```
## Helper Functions
The first two functions below are helpers that calculate the lower and upper bounds of the confidence interval. `plotCIs` creates the plots.
```{r}
library(ggplot2)
lowerCI <- function(v) {
mean(v) - sd(v)*1.96/sqrt(length(v))
}
upperCI <- function(v) {
mean(v) + sd(v)*1.96/sqrt(length(v))
}
plotCIs <- function(dataFrame, x_variable, y_variable, label, zeroLine=NA) {
p <- ggplot(dataFrame, aes(x=x_variable, fill=x_variable, y=y_variable))
if (!is.na(zeroLine)) {
p <- p + geom_hline(yintercept=0, linetype=zeroLine)
}
p <- p +
stat_summary(fun.ymin=lowerCI, fun.ymax=upperCI, geom="errorbar", aes(width=.1)) +
stat_summary(fun.y=mean, geom="point", shape=18, size=3, show.legend = FALSE) +
labs(x = NULL, y = label)
p
}
```
## Plotting the Data
Now the plots! First up, signed error for all four chart types, which shows bias.
```{r}
plotCIs(svAgg, svAgg$Type, svAgg$meanError, "Error (Bias)", 'dotted')
```
Next, absolute error, which shows precision.
```{r}
plotCIs(svAgg, svAgg$Type, svAgg$meanAbsError, "Absolute Error (Precision)")
```
## Choose Square Pie as Baseline
Now we create a separate data frame that has just the square pie values, and then join that back with the other data and calculate the difference between each of the chart types and the square pie.
This lets us compare the other chart types to the square pie by looking at whether their confidence intervals cross the zero line.
```{r message=FALSE}
svSquarePies <- filter(svAgg, Type == "Square Pie") %>%
select(ID,
sqpie_meanError = meanError,
sqpie_meanAbsError = meanAbsError,
t = Type)
svJoined <- left_join(svSquarePies, svAgg) %>%
mutate(sqpieDifferenceMeanError = meanError-sqpie_meanError,
sqpieDifferenceMeanAbsError = meanAbsError-sqpie_meanAbsError)
```
First again the signed error or bias.
```{r}
plotCIs(svJoined, svJoined$Type, svJoined$sqpieDifferenceMeanError, "Error relative to Square Pie (Bias)", 'dashed')
```
Next the absolute error, or precision.
```{r}
plotCIs(svJoined, svJoined$Type, svJoined$sqpieDifferenceMeanAbsError, "Absolute Error relative to Square Pie (Precision)", 'dashed')
```
## Confidence
It's also intersting to look at the confidence. People had much higher confidence in their guesses with the square pie, less with the pie and donut, and least with the stacked bar!
```{r}
ggplot(sv, aes(x=Type, fill=Confidence)) +
geom_bar(stat="count", position=position_dodge()) +
scale_fill_brewer(palette="Spectral") +
labs(x=NULL, y=NULL)
```
## Comparison with Arcs, Angles, Areas Study
To compare with the more recent study, we load in the arcs-angles-areas dataset. This requires some work to rename variables. Both datasets are reduced to the minimum set of columns to reduce merging issues.
```{r}
aaa <- read.csv("data/arcs-angles-areas-merged.csv") %>%
filter(chart_type == 'Pie Chart' | chart_type == 'Donut Chart') %>%
mutate(chart_type = factor(chart_type,
levels=c('Pie Chart', 'Donut Chart'),
labels=c('Pie AAA', 'Donut AAA')),
error = ifelse(log_error == log_opposite, judged_true, (100-ans_trial)-correct_ans),
subjectID = factor(subjectID)
) %>%
select(ID = subjectID, Type = chart_type, error)
aaaAggregated <- aaa %>%
group_by(Type, ID) %>%
summarize(meanError = mean(error),
meanAbsError = mean(abs(error)))
svReduced <- select(svAgg, ID, Type, meanError, meanAbsError)
```
Now we merge them and reorder the combined `Type` column for better charts.
```{r warning=FALSE}
svAAAMerged <- bind_rows(svReduced, aaaAggregated) %>%
mutate(Type = factor(Type,
levels=c("Square Pie", "Bar Chart", "Pie Chart", "Donut Chart", "Pie AAA", "Donut AAA")))
```
Comparison time! Signed error first again.
```{r}
plotCIs(svAAAMerged, svAAAMerged$Type, svAAAMerged$meanError, "Error (Bias)", 'dotted')
```
Absolute error.
```{r}
plotCIs(svAAAMerged, svAAAMerged$Type, svAAAMerged$meanAbsError, "Absolute Error (Precision)")
```
Oddly, the error in the more recent study is much higher than in the first one. The two chart types behave the same in each of them, though.