comparing-two-means-independent-groups.Rmd

Comparing two groups (continous dependent variable)
===================================================

christophe@pallier.org


```{r echo=FALSE}
rm(list=ls())
par(las=1, mar=c(2,3,1,1),mfrow=c(1,1))


# the "standard error" function
se <- function(x) { sd(x)/sqrt(length(x)) }
```


```{r twogroups_continous}
g1 <- 500 + rnorm(30, sd=40)
g2 <- 520 + rnorm(20, sd=30)
write(g1, 'group1.dat')
write(g2, 'group2.dat')
rm(g1, g2)
```

Data for this example are in two text files `group1.dat` and `group2.dat`. 

```{r}
g1 <- scan('group1.dat')
g2 <- scan('group2.dat')
```


We arrange them into a data frame with two columns: `group` (a factor with two modalities: `Gr1` and `Gr2`), and `y` which contains the values themselves.

```{r}
tg <- data.frame(group=factor(rep(c('Gr1', 'Gr2'), 
                                  c(length(g1), length(g2)))),
                 y=c(g1, g2))

head(tg)
str(tg)
table(tg$group)
```

### Graphical explorations

```{r}
hist(tg$y)
```

```{r}
boxplot(tg$y ~ tg$group)
```

When the samples are small, stripchart may be the best:

```{r}
stripchart(tg$y ~ tg$group, 
           vertical=TRUE,
           pch=1)
```

If the samples are large enough, you can create density plots:

```{r}
par(mfrow=(c(2,1)))
xsca <- range(tg$y)
for (gr in levels(tg$group))
{
  with(subset(tg, group==gr),
       {
         plot(density(y), xlim=xsca, main=gr, bty='l')
         rug(y, ticksize=0.1)
         })
}
```

Obtain the basic descriptive stats

```{r}
attach(tg)
signif(tapply(y, group, mean),3)
signif(tapply(y, group, median), 3)
signif(tapply(y, group, sd), 3)
signif(tapply(y, group, se), 3)
detach(tg)
```

### Inferential statistics


Student T-tests. First assuming equal variance, then relaxing this assumption
```{r}
t.test(y ~ group, data=tg, var.equal=TRUE)
t.test(y ~ group, data=tg)
```

Somewhat more information can be obtained by fitting linear models. 

First with a parametrisation (`contr.treatment`) of group where the intercept will correspond to the mean of group 1 and the effect will estimate the difference between the two groups. 

```{r}
contrasts(tg$group) <- contr.treatment
contrasts(tg$group)
summary(lm(y ~ group, data=tg))
```

Alternatively, one can prefer a parametrisation where the intercept estimates the global mean and the first parameter is the deviation from the global mean.

```{r}
contrasts(tg$group) <- contr.sum
contrasts(tg$group)
summary(lm(y ~ group, data=tg))
```

### Barplot with standard errors

Barplot with the means and their associated standard errors (note this is not the standard error for the difference between the groups' means, which is roughly $\sqrt{2}$ larger and, maybe for this reason, rarely used in psychology papers (like they rarely report confidence intervals))

```{r}
attach(tg)
par(mfrow=c(1,1))
means <- tapply(y, group, mean)
ses <- tapply(y, group, se)

ysca = c(min(means - 3 * ses), max(means + 3 * ses))

mp <- barplot(means, ylim=ysca, xpd=F)
arrows(mp, means-ses, 
       mp, means+ses, 
       code=3, angle=90)

detach(tg)
```

A much nicer plot can be constructed, with confidence intervals for the means and for their difference (Cumming, Geoff, and Sue Finch. 2005. “Inference by Eye: Confidence Intervals and How to Read Pictures of Data.” American Psychologist 60 (2): 170–180.)


```{r}
attach(tg)
m1 <- t.test(y[group=='Gr1'])$conf.int
m2 <- t.test(y[group=='Gr2'])$conf.int
di <- diff(t.test(y~group)$conf.int)
ysca <- c(min(c(m1,m2)-0.3*diff(range(c(m1,m2)))),
          max(c(m1,m2)+0.3*diff(range(c(m1,m2)))))
          
plot(c(Gr1=1, Gr2=2, difference=3),
     c(mean(m1), mean(m2), mean(m2)),
     pch=c(16,16,17), ylim=ysca, xlim=c(.5,3.5), axes=F, xlab='', ylab='')
axis(2, las=1)
axis(1,at=1:3,labels=c('Gr1','Gr2','difference'))
arrows(1:3, c(m1[1], m2[1], mean(m2)-di/2),
       1:3, c(m1[2], m2[2], mean(m2)+di/2),
       code=3, angle=90)
abline(h=mean(m1), lty=2)
abline(h=mean(m2), lty=2)

detach(tg)
```