-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcomparing-two-means-independent-groups.Rmd
162 lines (118 loc) · 3.86 KB
/
comparing-two-means-independent-groups.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
Comparing two groups (continous dependent variable)
===================================================
christophe@pallier.org
```{r echo=FALSE}
rm(list=ls())
par(las=1, mar=c(2,3,1,1),mfrow=c(1,1))
# the "standard error" function
se <- function(x) { sd(x)/sqrt(length(x)) }
```
```{r twogroups_continous}
g1 <- 500 + rnorm(30, sd=40)
g2 <- 520 + rnorm(20, sd=30)
write(g1, 'group1.dat')
write(g2, 'group2.dat')
rm(g1, g2)
```
Data for this example are in two text files `group1.dat` and `group2.dat`.
```{r}
g1 <- scan('group1.dat')
g2 <- scan('group2.dat')
```
We arrange them into a data frame with two columns: `group` (a factor with two modalities: `Gr1` and `Gr2`), and `y` which contains the values themselves.
```{r}
tg <- data.frame(group=factor(rep(c('Gr1', 'Gr2'),
c(length(g1), length(g2)))),
y=c(g1, g2))
head(tg)
str(tg)
table(tg$group)
```
### Graphical explorations
```{r}
hist(tg$y)
```
```{r}
boxplot(tg$y ~ tg$group)
```
When the samples are small, stripchart may be the best:
```{r}
stripchart(tg$y ~ tg$group,
vertical=TRUE,
pch=1)
```
If the samples are large enough, you can create density plots:
```{r}
par(mfrow=(c(2,1)))
xsca <- range(tg$y)
for (gr in levels(tg$group))
{
with(subset(tg, group==gr),
{
plot(density(y), xlim=xsca, main=gr, bty='l')
rug(y, ticksize=0.1)
})
}
```
Obtain the basic descriptive stats
```{r}
attach(tg)
signif(tapply(y, group, mean),3)
signif(tapply(y, group, median), 3)
signif(tapply(y, group, sd), 3)
signif(tapply(y, group, se), 3)
detach(tg)
```
### Inferential statistics
Student T-tests. First assuming equal variance, then relaxing this assumption
```{r}
t.test(y ~ group, data=tg, var.equal=TRUE)
t.test(y ~ group, data=tg)
```
Somewhat more information can be obtained by fitting linear models.
First with a parametrisation (`contr.treatment`) of group where the intercept will correspond to the mean of group 1 and the effect will estimate the difference between the two groups.
```{r}
contrasts(tg$group) <- contr.treatment
contrasts(tg$group)
summary(lm(y ~ group, data=tg))
```
Alternatively, one can prefer a parametrisation where the intercept estimates the global mean and the first parameter is the deviation from the global mean.
```{r}
contrasts(tg$group) <- contr.sum
contrasts(tg$group)
summary(lm(y ~ group, data=tg))
```
### Barplot with standard errors
Barplot with the means and their associated standard errors (note this is not the standard error for the difference between the groups' means, which is roughly $\sqrt{2}$ larger and, maybe for this reason, rarely used in psychology papers (like they rarely report confidence intervals))
```{r}
attach(tg)
par(mfrow=c(1,1))
means <- tapply(y, group, mean)
ses <- tapply(y, group, se)
ysca = c(min(means - 3 * ses), max(means + 3 * ses))
mp <- barplot(means, ylim=ysca, xpd=F)
arrows(mp, means-ses,
mp, means+ses,
code=3, angle=90)
detach(tg)
```
A much nicer plot can be constructed, with confidence intervals for the means and for their difference (Cumming, Geoff, and Sue Finch. 2005. “Inference by Eye: Confidence Intervals and How to Read Pictures of Data.” American Psychologist 60 (2): 170–180.)
```{r}
attach(tg)
m1 <- t.test(y[group=='Gr1'])$conf.int
m2 <- t.test(y[group=='Gr2'])$conf.int
di <- diff(t.test(y~group)$conf.int)
ysca <- c(min(c(m1,m2)-0.3*diff(range(c(m1,m2)))),
max(c(m1,m2)+0.3*diff(range(c(m1,m2)))))
plot(c(Gr1=1, Gr2=2, difference=3),
c(mean(m1), mean(m2), mean(m2)),
pch=c(16,16,17), ylim=ysca, xlim=c(.5,3.5), axes=F, xlab='', ylab='')
axis(2, las=1)
axis(1,at=1:3,labels=c('Gr1','Gr2','difference'))
arrows(1:3, c(m1[1], m2[1], mean(m2)-di/2),
1:3, c(m1[2], m2[2], mean(m2)+di/2),
code=3, angle=90)
abline(h=mean(m1), lty=2)
abline(h=mean(m2), lty=2)
detach(tg)
```