-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathancova.qmd
executable file
·895 lines (740 loc) · 46.1 KB
/
ancova.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
# Linear regression models
```{r}
#| label: setup
#| file: "_common.R"
#| include: true
#| message: false
```
## Linear models for factorial designs
If factors measure a continuous quantity, we may alternatively consider a statistical model that describes a curve or a straight line, rather than only determining the mean at the levels measured in an experiment. These types of experiments are rather common in engineering.
:::{#exm-sokolova}
## Is additional paper wrapping viewed as more eco-friendly?
@Sokolova.Krishna.Doring:2023 consider consumer bias when assessing how eco-friendly packages are. Items such as cereal are packaged in plastic bags, which themselves are covered in a box. They conjecture (and find) that, paradoxically, consumers tend to view the packaging as being more eco-friendly when the amount of cardboard
or paper surrounding the box is larger, relative to the sole plastic package. We consider in the sequel the data from Study 2A, which measures the perceived environmental friendliness (PEF, variable `pef`) as a function of the `proportion` of paper wrapping (either none, half of the area of the plastic, equal or twice).
The linear model we can envision measures the effect of `pef` as a linear function of `proportion`, with
\begin{align*}
\mathsf{E}(\texttt{pef} \mid \texttt{proportion}) = \beta_0 + \beta_1 \texttt{proportion}
\end{align*}
and with homoscedastic observations. More general models would include polynomials (up to degree $K-1$ for a factor with $K$ levels).
If we fit against the simple linear regression model, we can extract the estimated coefficients and the $p$-values for the $t$-test for $\beta_0$ and $\beta_1$. The test for the intercept is of no interest since data are measured on a scale from 1 to 7, so the mean response when `proportion=0` cannot be zero. The coefficient for `proportion` suggests a trend of 0.5 point per unit ratio, and this is significantly different from zero, indicating that the `pef` score changes with the paper to plastic ratio.
```{r}
#| eval: true
#| echo: true
data(SKD23_S2A, package = "hecedsm") # load data
linmod <- lm(pef ~ proportion, data = SKD23_S2A)
# fit simple linear regression
coef(linmod) # extract intercept and slope
```
Let $\mu_{0}, \mu_{0.5}, \mu_{1}, \mu_2$ denote the true mean of the PEF score as a function of the proportion of paper. There are several tests that could be of interest here, but we focus on contrasts performed by authors and an hypothesis test of linearity as a function of the proportion of plastic. For the latter, we could compare the linear regression model (in which the PEF score increases linearly with the proportion of paper to plastic) against the ANOVA which allows each of the four groups to have different means.
If we use $\boldsymbol{\alpha} \in \mathbb{R}^4$ to denote the parameter vector of the analysis of variance model using the treatment parametrization and $\boldsymbol{\beta} \in \mathbb{R}^2$ for the simple linear regression model, then we have
\begin{align*}
\mu_0 &= \beta_0=\alpha_0 \\
\mu_{0.5} &= \beta_0 + 0.5 \beta_1 = \alpha_0 + \alpha_1\\
\mu_1 &= \beta_0 + \beta_1 = \alpha_0 + \alpha_2 \\
\mu_2 &= \beta_0 + 2 \beta_1= \alpha_0 + \alpha_3.
\end{align*}
The test comparing the simple linear regression with the analysis of variance imposes two simultaneous restrictions, with $\mathscr{H}_0: \alpha_3 = 2\alpha_2= 4\alpha_1$, so the null distribution is $\mathsf{Fisher}(2, 798)$ or roughly $\chi^2_2$.
```{r}
#| eval: true
#| echo: true
anovamod <- lm(pef ~ factor(proportion), # one-way ANOVA
data = SKD23_S2A)
# Compare simple linear regression with ANOVA
anova(linmod, anovamod) # is the change in PEF linear?
# Specifying the weights
# these contrasts encode the mean (so don't sum to zero)
car::linearHypothesis(model = anovamod,
hypothesis = rbind(c(0, -2, 1, 0),
c(0, 0, -2, 1)))
```
We see from the output that the $F$ tests and the $p$-values are identical, whether we impose the constraints manually or simply feed the two nested models to the `anova` method.
The authors were interested in comparing none with other choices: we are interested in pairwise differences, but only relative to the reference $\mu_{0}$:
\begin{align*}
\mu_0 = \mu_{0.5} & \iff 1\mu_0 - 1\mu_{0.5} + 0\mu_{1} + 0 \mu_{2} = 0\\
\mu_0 = \mu_{1} & \iff 1\mu_0 + 0\mu_{0.5} -1\mu_{1} + 0 \mu_{2} = 0\\
\mu_0 = \mu_{2} & \iff 1\mu_0 + 0\mu_{0.5} + 0\mu_{1} -1 \mu_{2} = 0
\end{align*}
so contrast vectors $(1, -1, 0, 0)$, $(1, 0, -1, 0)$ and $(1, 0, 0, -1)$ for the marginal means would allow one to test the hypothesis.
```{r}
#| eval: true
#| echo: true
margmean <- anovamod |>
emmeans::emmeans(specs = "proportion") # group means
contrastlist <- list( # specify contrast vectors
refvshalf = c(1, -1, 0, 0),
refvsone = c(1, 0, -1, 0),
refvstwo = c(1, 0, 0, -1))
# compute contrasts relative to reference
margmean |> emmeans::contrast(method = contrastlist)
```
```{r}
#| label: SKD23S2A-save
#| eval: true
#| echo: false
data(SKD23_S2A, package = "hecedsm") # load data
linmod <- lm(pef ~ proportion, data = SKD23_S2A) # fit simple linear regression
anovamod <- lm(pef ~ factor(proportion), data = SKD23_S2A) # one-way ANOVA
# anova(linmod, anovamod) # compare models (two restrictions)
# group means
margmean <- anovamod |> emmeans::emmeans(specs = "proportion")
contrastlist <- list( # specify contrast vectors
refvshalf = c(1, -1, 0, 0),
refvsone = c(1, 0, -1, 0),
refvstwo = c(1, 0, 0, -1))
contrast <- margmean |> emmeans::contrast(
method = contrastlist)
```
The group averages are reported in @tbl-print-groupmeans-PEF, match those reported by the authors in the paper. They suggest an increased perceived environmental friendliness as the amount of paper used in the wrapping increases. We could fit a simple regression model to assess the average change, treating the proportion as a continuous explanatory variable. The estimated slope for the change in PEF score, which ranges from 1 to 7 in increments of 0.25, is `r round(coef(linmod)[2],2)` point per ratio of paper/plastic. There is however strong evidence, given the data, that the change isn't quite linear, as the fit of the linear regression model is significantly worse than the corresponding linear model.
```{r}
#| label: tbl-print-groupmeans-PEF
#| echo: false
#| eval: true
#| tbl-cap: "Estimated group averages of PEF per proportion with standard errors"
knitr::kable(margmean,
digits = c(2,2,3,0,2,4),
booktabs = TRUE,
col.names = c("proportion",
"marg. mean",
"std. err.",
"dof",
"lower (CI)",
"upper (CI)")) |>
kableExtra::kable_styling()
```
```{r}
#| label: tbl-print-contrast-PEF
#| echo: false
#| eval: true
#| tbl-cap: "Estimated contrasts for differences of PEF to no paper."
kable(contrast,
booktabs = TRUE,
digits = c(2,2,2,0,2,2),
col.names = c("contrast",
"estimate",
"std. err.",
"dof",
"stat",
"p-value")) |>
kableExtra::kable_styling()
```
All differences reported in @tbl-print-contrast-PEF are significant and positive, in line with the researcher's hypothesis.
:::
## Analysis of covariance
The previous chapter dealt with factorial experiments in which all experimental factors are of interest. It is possible to use measurements concomitant to the data collection (for example, value to a test before we complete the group assignment for the manipulation) to get a measure of the relative strength of students. The more correlated these measures are with the response, the more we can explain the data. We then proceed with the random assignment of our experimental units to different conditions.
Including covariates should in principle increase power and our ability to detect real differences due to experimental manipulations, provided the variables used as control are correlated with the response. Generally, they are not needed for valid inference, which is guaranteed by randomization, and shouldn't be used to assign treatment. Such designs are meant to reduce the error.
We can include continuous covariates to the analysis of variance, whose slope governs the relationship with the response. The strict inclusion isn't necessary to draw valid causal conclusion, but adding the term helps again reduce the residual variability. Such a design was historically called **analysis of covariance**, although as analysis of variance models, they are nothing but linear regression models. The ANCOVA model assumes that slopes are parallel: if rather there is an interaction term, then the experimental manipulation induces changes that depend on the value of the continuous explanatory. This case is termed **moderation** in the management literature.
In an analysis of covariance, we include a linear component for a (continuous) covariate, with the purpose again to reduce residual error and increase power. A prime example is prior/post experiment measurements, whereby we monitor the change in outcome due to the manipulation. This post by Solomon Kurz [[link]](https://solomonkurz.netlify.app/blog/2023-04-12-boost-your-power-with-baseline-covariates/) nicely illustrates the added benefits of using covariates when there is strong correlation between your response and the latter
In such setting, it may seem logical to take the difference in post and prior score as response: this is showcased in @exm-vanStek and @Baumann:1992, an analysis of which is presented on [the course website](https://edsm.rbind.io/example/06-ancova/).
When we add a covariate, we need the latter to have a strong linear correlation for the inclusion to make sense. We can assess graphically whether the relationship is linear, and whether the slopes for each experimental condition are the same.^[If not, this implies that the covariate interacts with the experimental condition.]
```{r}
#| label: fig-ancovadifftrend
#| echo: false
#| eval: true
#| cache: true
#| fig-cap: "Simulated data from two groups with an analysis of covariance model. "
set.seed(123)
ng <- 24
x1 <- rgamma(n = ng, shape = 10, scale = 5)
x2 <- rgamma(n = ng, shape = 7, scale = 3)
dat <- data.frame(
covariate = c(x1, x2),
response = c(2 + 0.2*x1 + rnorm(ng, sd = 4), 5 + 0.4*x2 + rnorm(ng, sd = 4)),
group = factor(rep(c("treatment", "control"), each = ng)))
g1 <- ggplot(data = dat, aes(x = covariate, y = response, group = group, color = group)) +
geom_point() +
geom_rug(sides = "b") +
geom_abline(data = data.frame(group = factor(c("treatment",
"control")),
intercept = c(2,5),
slope = c(0.2,0.4)),
mapping = aes(intercept = intercept,
slope = slope,
color = group)) +
theme_classic()
set.seed(123)
ng <- 24
x1 <- rgamma(n = ng, shape = 10, scale = 5)
x2 <- rgamma(n = ng, shape = 10, scale = 5)
dat <- data.frame(
covariate = c(x1, x2),
response = c(2 + 0.2*x1 + rnorm(ng, sd = 4), 5 + 0.2*x2 + rnorm(ng, sd = 4)),
group = factor(rep(c("treatment", "control"),
each = ng)))
g2 <- ggplot(data = dat,
aes(x = covariate,
y = response,
group = group,
color = group)) +
geom_point() +
geom_rug(sides = "b") +
geom_abline(data = data.frame(group = factor(c("treatment",
"control")),
intercept = c(2,5),
slope = c(0.2,0.2)),
mapping = aes(intercept = intercept,
slope = slope,
color = group)) +
theme_classic()
g2 + g1 +
plot_layout(guides = 'collect') &
theme(legend.position = "bottom")
```
The left panel of @fig-ancovadifftrend shows the ideal situation for an analysis of covariate: the relationship between response and covariate is linear with strong correlation, with the same slope and overlapping support. Since the slopes are the same, we can compare the difference in average (the vertical difference between slopes at any level of the covariate) because the latter is constant, so this depiction is useful. By contrast, the right-hand panel of @fig-ancovadifftrend shows an interaction between the covariate and the experimental groups, different slopes: there, the effect of the experimental condition increases with the level of the covariate. One may also note that the lack of overlap in the support, the set of values taken by the covariate, for the two experimental conditions, makes comparison hazardous at best in the right-hand panel.
@fig-ancovaresid shows that, due to the strong correlation, the variability of the measurements is smaller on the right-hand panel (corresponding to the analysis of covariance model) than for the centred response on the left-hand panel; note that the $y$-axes have different scales.
```{r}
#| label: fig-ancovaresid
#| echo: false
#| eval: true
#| cache: true
#| fig-cap: "Response after subtracting mean (left) and after detrending (right)."
g3 <- ggplot(data = dat,
aes(color = group,
x = group,
y = response - mean(response))) +
geom_jitter() +
scale_y_continuous(limits = c(-10,10), breaks = seq(-10, 10, by = 5)) +
labs(y = "centered response",
subtitle = "ANOVA")+
theme_classic()
dat$resid <- resid(lm(response ~ covariate,
data = dat))
g4 <- ggplot(data = dat,
aes(color = group,
x = group,
y = resid)) +
geom_jitter() +
scale_y_continuous(limits = c(-10,10), breaks = seq(-10, 10, by = 5)) +
labs(y = "residuals (detrended observations)",
subtitle = "ANCOVA") +
theme_classic()
g3 + g4 +
plot_layout(guides = 'collect') &
theme(legend.position = "bottom")
```
We present two examples of analysis of covariance, showing how the inclusion of covariates helps disentangle differences between experimental conditions.
::: {#exm-leechoi}
## Inconsistency of product description and image in online retailing
@Lee.Choi:2019 measured the impact of discrepancies between descriptions and visual depiction of items in online retail. They performed an experiment in which participants were presented with descriptions of a product (a set of six toothbrushes) that was either consistent or inconsistent with the description. The authors postulated that a discrepancy could lead to lower appreciation score, measured using three Likert scales. They also suspected that the familiarity with the product brand should impact ratings, and controlled for the latter using another question.
One way to account for familiarity when comparing the mean is to use a linear regression with familiarity as another explanatory variable. The expected value of the product evaluation is
$$
\mathsf{E}(\texttt{prodeval}) = \beta_0 + \beta_1 \texttt{familiarity} + \beta_2 \texttt{consistency},
$$ {#eq-vS}
where $\texttt{familiarity}$ is the score from 1 to 7 and $\texttt{consistency}$ is a binary indicator equal to one if the output is inconsistent and zero otherwise.
The coefficient $\beta_2$ thus measures the difference between product evaluation rating for consistent vs inconsistent displays, for the same familiarity score.
```{r}
#| label: LC19-linreg
#| eval: true
#| echo: false
# Load data
data(LC19_S1, package = "hecedsm")
# Fit a linear regression
ancova <- lm(prodeval ~ familiarity + consistency,
data = LC19_S1)
# Output coefficients and t-tests
#summary(ancova)
# Anova table
#car::Anova(ancova, type = 3)
```
We can look at coefficient (standard error) estimates $\widehat{\beta}_2 = `r round(coef(ancova)[3], 2)` (`r round( as.numeric(sqrt(diag(vcov(ancova)))[3]),3)`)$.
No difference between groups would mean $\beta_2=0$ and we can build a test statistic by looking at the standardized regression coefficient $t = \widehat{\beta}_2/\mathsf{se}(\widehat{\beta}_2)$. The result output is `r papaja::apa_print(ancova)$full_result$consistencyinconsistent`. We reject the null hypothesis of equal product evaluation for both display at level 5%: there is evidence that there is a small difference, with people giving on average a score that is 0.64 points smaller (on a scale of 1 to 9) when presented with conflicting descriptions and images.
We can compare the analysis of variance table obtained by fitting the model with and without $\texttt{familiarity}$. @tbl-anovatabLC19S1 shows that the effect of consistency is small and not significant and a two-sample _t_-test shows no evidence of difference between the average familiarity score in both experimental conditions ($p$-value of $`r papaja::apa_p(broom::tidy(anova(aov(familiarity ~ consistency, data = LC19_S1)))$p.value)[1]`$). However, we can explain roughly one fifth of the residual variability by the familiarity with the brand (see the sum of squares in @tbl-anovatabLC19S1): removing the latter leads to a higher signal-to-noise ratio for the impact of consistency, at the expense of a loss of one degree of freedom. Thus, it appears that the manipulation was successful.
```{r}
#| label: tbl-anovatabLC19S1
#| echo: false
#| eval: true
#| tbl-cap: "Analysis of variance tables"
data(LC19_S1, package = "hecedsm")
anovaMod <- lm(prodeval ~ consistency,
data = LC19_S1)
ancovaMod <- lm(prodeval ~ familiarity + consistency,
data = LC19_S1)
options(knitr.kable.NA = '')
tab_anova <- broom::tidy(
car::Anova(anovaMod, type = 3)[-1,])
tab_anova$p.value <- papaja::apa_p(tab_anova$p.value)
tab_ancova <- broom::tidy(
car::Anova(ancovaMod, type = 3)[-1,])
tab_ancova$p.value <- papaja::apa_p(tab_ancova$p.value)
# Output coefficients and t-tests
knitr::kable(tab_anova,
col.names = c("term","sum. sq.", "df","stat","p-value"),
digits = c(0,2,1,2,3,3), booktabs = TRUE,
caption = "model without familiarity")
knitr::kable(tab_ancova,
col.names = c("term","sum. sq.", "df","stat","p-value"),
digits = c(0,2,1,2,3,3), booktabs = TRUE,
caption = "model with familiarity")
```
```{r}
#| label: fig-ANCOVA-demo
#| fig-cap: "Scatterplot of product evaluation as a function of the familiarity score, split by experimental manipulation."
#| eval: true
#| echo: false
#| out-width: '70%'
#| fig-width: 8
#| fig-height: 6
#| message: false
#| warning: false
set.seed(1234)
data(LC19_S1, package = "hecedsm")
ggplot(data = LC19_S1,
mapping = aes(y = prodeval,
x = familiarity,
col = consistency)) +
geom_point(position = "jitter") +
geom_smooth(method = "lm",
formula = y ~ x,
show.legend = FALSE,
fullrange = TRUE,
se = FALSE) +
scale_y_continuous(limits = c(1,9),
breaks = c(3L, 6L, 9L)) +
scale_x_continuous(limits = c(3,7),
breaks = 1:7) +
theme_classic() +
theme(legend.position = "bottom") +
labs(y = "product evaluation",
col = "display")
```
@fig-ANCOVA-demo shows that people more familiar with the product or brand tend to have a more positive product evaluation, as postulated by the authors. The graph also shows two straight lines corresponding to the fit of a linear model with different intercept and slope for each display group: there is little material difference, one needs to assess formally whether a single linear relationship as the one postulated in @eq-vS can adequately characterize the relation in both groups.
To this effect, we fit a linear model with different slopes in each group, and compare the fit of the latter with the analysis of covariance model that includes a single slope for both groups: we can then test if the slopes are the same, or alternatively if the difference between the slopes is zero. The _t_-statistic indicates no difference in slope ($p$-value of $`r papaja::apa_p(broom::tidy(anova(lm(prodeval ~ familiarity + consistency, data = LC19_S1), lm(prodeval ~ familiarity*consistency, data = LC19_S1)))$p.value)[2]`$), thus the assumption is reasonable. Levene's test for homogeneity of variance indicates no discernible difference between groups. Thus, it appears there is a difference in perception of product quality due to the manipulation.
:::
::: {#exm-vanStek}
## Effect of scientific consensus on false beliefs
We consider Study 3 of @vanStekelenburg:2021, who studied changes in perception of people holding false beliefs or denying (to some extent) the scientific consensus by presenting them with news article showcasing information about various phenomena. The experimental manipulation consisted in presenting boosting, a form of training to help readers identify and establish whether scientifists were truly expert in the domain of interest, how strong was the consensus, etc.^[The article is interesting because lack of planning/changes led them to adapt the design from experiment 1 to 3 until they found something. Without preregistration, it is unlikely such findings would have been publishable.]
The third and final experiment of the paper focused on genetically modified organisms: it is a replication of Study 2, but with a control group (since there were no detectable difference between experimental conditions `Boost` and `BoostPlus`) and a larger sample size (because Study 2 was underpowered).
The data include 854 observations with `prior`, the negative of the prior belief score of the participant, the `post` experiment score for the veracity of the claim. Both were measured using a visual scale ranging from -100 (I am 100% certain this is false) to 100 (I am 100% certain this is true), with 0 (I don’t know) in the middle. Only people with negative prior beliefs were recruited to the study. The three experimental conditions were `BoostPlus`, `consensus` and a `control` group. Note that the scores in the data have been negated, meaning that negative posterior scores indicate agreement with the consensus on GMO.
```{r}
#| label: SSVB21S3-fit
#| eval: true
#| echo: false
library(emmeans)
library(nlme)
data(SSVB21_S3, package = "hecedsm")
# ANOVA model
SSVBmod0 <- lm(post ~ condition,
data = SSVB21_S3)
# ANCOVA model (with prior as covariate)
SSVBmod1 <- lm(post ~ prior + condition,
data = SSVB21_S3)
# Check equality of slope (ok)
SSVBmod2 <- lm(post ~ prior*condition,
data = SSVB21_S3)
#anova(SSVBmod1, SSVBmod2)
# Check randomization (ok)
SSVBmod3 <- lm(prior ~ condition,
data = SSVB21_S3)
# Check equal variance per group (nope)
levene <-
car::leveneTest(resid(SSVBmod1) ~ condition,
data = SSVB21_S3)
m1 <- nlme::gls(
post ~ prior + condition,
weights = varIdent(form = ~1 | condition),
data = SSVB21_S3,
method = "ML")
m2 <- nlme::gls(
post ~ prior + condition,
data = SSVB21_S3,
method = "ML")
# anova(m1, m2)
# Clear heteroscedasticity...
# Fit a model with different variances
library(nlme)
SSVBmod4 <- nlme::gls(
post ~ prior + condition,
weights = varIdent(form = ~1 | condition),
data = SSVB21_S3)
# Compare model with unequal variance with
# model with equal variance (bis),
# this time with likelihood ratio test
SSVBmod5 <- gls(
post ~ condition,
weights = varIdent(form = ~ 1 |condition),
data = SSVB21_S3)
# Contrasts
emm4 <- emmeans(SSVBmod4, specs = "condition")
emm5 <- emmeans(SSVBmod5, specs = "condition")
# Note order: Boost, consensus, control
# Not comparable:
# emm4 uses detrended data,
# emm5 uses group averages
#
# Compute planned contrasts
contrast_list <- list(
"consensus vs control" = c(0, 1, -1),
"consensus vs BoostPlus" = c(-1, 1, 0),
"BoostPlus vs control " = c(1, 0, -1))
c4 <- contrast(emm4,
method = contrast_list,
p.adjust = "holm")
# Without controlling for prior beliefs
c5 <- contrast(emm5,
method = contrast_list,
p.adjust = "holm")
```
Preliminary checks suggest that, although the slopes for prior beliefs could plausibly be the same in each group and the data are properly randomized, there is evidence of unequal variance for the changes in score. As such, we fit a model with mean
\begin{align*}
\mathsf{E}(\texttt{post}) &= \begin{cases}
\beta_0 + \beta_1 \texttt{prior} + \alpha_1 & \texttt{condition} = \texttt{BoostPlus}\\
\beta_0 + \beta_1 \texttt{prior} + \alpha_2 &\texttt{condition} = \texttt{consensus}\\
\beta_0 + \beta_1 \texttt{prior} + \alpha_3 &\texttt{condition} = \texttt{control}
\end{cases}
\end{align*}
with $\alpha_1 + \alpha_2 + \alpha_3=0$, using the sum-to-zero parametrization, and with different variance for each experimental condition,
\begin{align*}
\mathsf{Va}(\texttt{post}) = \begin{cases}
\sigma^2_1, & \texttt{condition} = \texttt{BoostPlus},\\
\sigma^2_2, & \texttt{condition} = \texttt{consensus},\\
\sigma^2_3, & \texttt{condition} = \texttt{control}.
\end{cases}
\end{align*}
Because of the unequal variances, we cannot use multiple testing procedures reserved for analysis of variance and resort instead to Holm--Bonferroni to control the familywise error rate. We here look only at pairwise differences between conditions.^[In Study 2, the interest was comparing manipulation vs control and the Boost vs BoostPlus conditions, two orthogonal contrasts.]
```{r}
#| label: tbl-anovatabSSVB
#| echo: false
#| eval: true
#| tbl-cap: "Analysis of variance tables"
options(knitr.kable.NA = '')
tab_anova <- broom::tidy(
car::Anova(SSVBmod5, type = 3)[-1,])
tab_anova$p.value <- papaja::apa_p(tab_anova$p.value)
tab_ancova <- broom::tidy(
car::Anova(SSVBmod4, type = 3)[-1,])
tab_ancova$p.value <- papaja::apa_p(tab_ancova$p.value)
# Output coefficients and t-tests
knitr::kable(tab_anova,
booktabs = TRUE,
caption = "ANOVA model (without prior belief)",
col.names = c("term","df","stat","p-value"),
digits = c(0,2,1,2))
knitr::kable(tab_ancova,
booktabs = TRUE,
caption = "ANCOVA model (with prior belief)",
col.names = c("term","df","stat","p-value"),
digits = c(0,2,1,2))
```
Repeating the exercise of comparing the amount of evidence for comparison with and without inclusion of a covariate shows that the value of the test statistic is larger (@tbl-anovatabSSVB), indicative of stronger evidence with the analysis of covariance model: the conclusion would be unaffected with such large sample sizes. We of course care very little for the global $F$ test of equality of mean, as the previous study had shown large differences. What is more interesting here is quantifying the change between conditions.
```{r}
#| label: tbl-contraststabSSVB
#| echo: false
#| tbl-cap: "Pairwise contrasts with _p_-values adjusted using Holm--Bonferroni"
#| eval: true
options(knitr.kable.NA = '')
tab_anova <- broom::tidy(c5) |> dplyr::select(!c("term","null.value"))
tab_anova$p.value <- papaja::apa_p(tab_anova$p.value)
tab_ancova <- broom::tidy(c4) |> dplyr::select(!c("term","null.value"))
tab_ancova$p.value <- papaja::apa_p(tab_ancova$p.value)
# Output coefficients and t-tests
knitr::kable(tab_anova,
booktabs = TRUE,
digits = c(0,2,1,2,3,3),
caption = "ANOVA model (without prior belief score).")
knitr::kable(tab_ancova,
booktabs = TRUE,
digits = c(0,2,1,2,3,3),
caption = "ANCOVA model (with prior belief score).")
```
@tbl-contraststabSSVB shows the pairwise contrasts, which measure two different things: the analysis of variance model compares the average in group, whereas the analysis of covariance (the linear model with `prior`) uses detrended values and focuses on the change in perception. Because the data are unbalanced and we estimate group mean and variance separately, the degrees of freedom change from one pairwise comparison to the next. Again, using the covariate `prior`, which is somewhat strongly correlated with `post` as seen in @fig-vanStekS3, helps decrease background noise.
```{r}
#| label: tbl-vanStekS3
#| eval: true
#| echo: false
#| message: false
#| warning: false
#| tbl-cap: "Summary statistics of belief as a function of time of measurement and experimental condition."
# Transform data so that all observations
# measurements are in a single column
# post/prior become labels of a categorical variable
SSVB21_S3_long <- SSVB21_S3 |>
tidyr::pivot_longer(cols = c("prior","post"),
names_to = "time",
values_to = "belief") |>
dplyr::mutate(time = relevel(factor(time),
ref = "prior"))
# Compute summary statistics by group
# average and its standard error = std.dev / sqrt(n)
SSVB21_S3_summary <- SSVB21_S3_long |>
dplyr::group_by(time, condition) |>
dplyr::summarize(mean = mean(belief),
se = sd(belief)/sqrt(dplyr::n()))
knitr::kable(SSVB21_S3_summary,
digits = 2,
booktabs = TRUE,
linesep = "")
```
```{r}
#| label: fig-vanStekS3
#| eval: true
#| echo: false
#| fig-cap: "Difference between prior and post experiment beliefs on genetically engineered food."
# Create a plot of data by timing
# jitter observations to avoid overlay
ggplot(data = SSVB21_S3_summary,
# Aesthetics common to all
aes(x = time,
group = condition)) +
geom_point(data = SSVB21_S3_long,
mapping = aes(x = time,
y = belief,
col = condition,
group = condition),
position = position_jitterdodge(jitter.width = 0.25)) +
# Add plus or one standard error for the mean
geom_linerange(data = SSVB21_S3_summary,
aes(ymin = mean - se,
ymax = mean + se),
position = position_dodge(width = 0.75)) +
# Add a line for each group mean
geom_errorbar(data = SSVB21_S3_summary,
aes(x = time,
ymax = mean,
ymin = mean,
color = condition),
position = position_dodge(width = 0.75)) +
# Add lines for each group (prior to post)
geom_line(data = SSVB21_S3_summary,
mapping =
aes(x = time,
group = condition,
color = condition,
y = mean),
position = position_dodge(0.75)
) +
theme_classic() +
theme(legend.position = "bottom")
```
:::
:::{.callout-caution}
@vanStekelenburg:2021 split their data to do pairwise comparisons two at the time (thus taking roughly two-third of the data to perform a two sample _t_-test with each pair). Although it does not impact their conclusion, this approach is conceptually incorrect: if the variance was equal, we would want to use all observations to estimate it (so their approach would be suboptimal, since we would estimate the variance three times with smaller samples).
On the contrary, using a model that assumes equal variance when it is not the case leads to distortion: the variance we estimate will be some sort of average of the variability $\sigma_i$ and $\sigma_j$ in experimental condition $i$ and $j$, again potentially leading to distortions. With large samples, this may be unconsequential, but illustrates caveats of subsample analyses.
:::
:::{.callout-caution}
@fig-vanStekS3f1 shows the relationship between prior and posterior score. The data show clear difference between individuals: many start from completely disbelieving of genetically engineered food and change their mind (sometimes drastically), there are many people who do not change idea at all and have similar scores, and many who give a posterior score of zero. This heterogeneity in the data illustrates the danger of only looking at the summary statistics and comparing averages. It does not tell the whole picture! One could investigate whether the strength of religious or political beliefs, or how much participants trust scientists, could explain some of the observed differences.
:::
```{r}
#| label: fig-vanStekS3f1
#| echo: false
#| eval: true
#| out-width: '80%'
#| fig-cap: "Scatterplot of negated prior and posterior belief score."
# Scatterplot of data
ggplot(data = SSVB21_S3,
aes(x = prior,
y = post)) +
geom_point(aes(col = condition)) +
# geom_smooth(method = "lm",
# se = FALSE,
# col = "black") +
theme_classic() +
theme(legend.position = "bottom")
```
## Moderation and interactions
In a randomized experiment, we can check the average outcome of a manipulation by comparing groups: assuming random sampling, these conclusions can be broadly generalized to the population of interest from which the sample is drawn. However, it may be that the effect of the treatment depends on other variables: cultural differences, gender or education may change.
The causal effect on $Y$ of the experimental manipulation, say $X$ may be a misleading summary if another variable modifies the relation of $X \to Y$: for example, the perception of gender discrimination or racism may depend on the person background and experience and this may impact the effect of the manipulation. Such variables, say $W$, thus have an interactive effect with the experimental factor $X$, termed moderator in psychology. In the statistical model, inclusion of interaction terms between $X$ and $W$ (typically via product of the so-called moderator variable with the factor encoding the experimental sub-condition) will allow us to estimate those differences.
In an analysis of covariance, covariates are included that have an impact on the outcome to filter out variability, but with the assumption that they do not influence the effect of treatment (no interaction). With moderators, we rather include and tests for the interactions. If we have an experimental factor $X$ which is binary or categorical, the resulting model is a simple analysis of variance model (or analysis of covariance) and we can test the significance of the interaction term to assess the moderating effect of $W$.
If $W$ is a mean-centered continuous variable and $X$ a categorical variable with $k=1, \ldots, K$ levels using the sum-to-zero parametrization, the linear model with mean
$$
\underset{\text{average response at $W$ in group $k$}}{\mathsf{E}\{Y \mid \mathsf{do}(X) = k, W\}} = \underset{\text{intercept for group $k$}}{\mu + \alpha_k} + \underset{\text{slope for group $k$}}{(\beta + \gamma_k)}W
$$
includes different slopes for $W$ in each experimental group, as well as different intercepts for each group.
:::{#exm-moderation1}
## Gender discrimation
We consider data from a study on gender discrimination [@Garcia:2010]. Participants were put with a file where a women was turned down promotion in favour of male colleague despite her being clearly more experimented and qualified. The authors manipulated the decision of the participant to this decision, either choosing not to challenge the decision (no protest), a request to reconsider based on individual qualities of the applicants (individual) and a request to reconsider based on abilities of women (collective). All items were measured using scales constructed using items measured using Likert scales ranging from strongly disagree (1) to strongly agree (7).
The postulated mediator variable is `sexism`, the average of 6 Likert scales for the Modern Sexism Scale assessing pervasiveness of gender discrimination. We consider participants' evaluation of the appropriateness of the response of the fictional character.
We fit the linear model with the interaction and display the observed slopes
```{r}
#| eval: false
#| echo: true
data(GSBE10, package = "hecedsm")
lin_moder <- lm(respeval ~ protest*sexism,
data = GSBE10)
summary(lin_moder) # coefficients
car::Anova(lin_moder, type = 3) # tests
```
```{r}
#| label: tbl-testsmoder
#| echo: false
#| eval: true
#| message: false
#| warning: false
#| tbl-cap: "Analysis of variance table for the linear moderation model of ."
data(GSBE10, package = "hecedsm")
linmoder <- lm(respeval ~ protest*sexism,
data = GSBE10)
# summary(linmoder) # coefficients
options(knitr.kable.NA = '')
tab_anova <- broom::tidy(
car::Anova(linmoder, type = 3)[-1,])
tab_anova$p.value <- papaja::apa_p(tab_anova$p.value)
knitr::kable(tab_anova,
booktabs = TRUE,
col.names = c("term", "sum of squares","df","stat","p-value"),
digits = c(0,2,1,2,3,3))
```
```{r}
#| eval: true
#| echo: false
#| warning: false
#| message: false
#| out-width: '80%'
ggplot(data = GSBE10,
aes(x = sexism,
y = respeval,
color = protest,
group = protest)) +
geom_point() +
geom_smooth(se = FALSE, method = "lm", formula = y ~ x) +
labs(subtitle = "evaluation of response",
y = "",
color = "experimental condition") +
theme_classic() +
theme(legend.position = "bottom")
```
Because of the interaction, comparing the levels of the experimental factor only makes sense if we fix the value of sexism (since the slopes are not parallel) and won't necessarily be reliable outside of the range of observed values of sexism. We could look at quantiles and differences at the mean sexism,^[This is the default with `emmeans`] or one standard deviation away.
:::
We may be interested in the range of values of the predictor $W$ for which the difference between treatments is not statistically significant if we only have a binary treatment. The Johnson--Neyman method [@Johnson.Neyman:1936] considers this range, but this leads to multiple testing problems since we probe the model repeatedly. @Esarey.Sumner:2018 offer a method that provides control for the false discovery rate.
To illustrate the method, we dichotomize the manipulation pooling individual and collective protests, since these are the most similar.
```{r}
#| eval: true
#| echo: true
#| fig-cap: "Johnson--Neyman plot for difference between protest and no protest as a function of sexism."
#| label: fig-jn
library(interactions)
db <- GSBE10 |>
dplyr::mutate(
protest = as.integer(protest != "no protest"))
lin_moder2 <- lm(respeval ~ protest*sexism, data = db)
jn <- johnson_neyman(
model = lin_moder2, # linear model
pred = protest, # binary experimental factor
modx = sexism, # moderator
control.fdr = TRUE,
mod.range = range(db$sexism))
jn$plot
```
The cutoff value is 4.20 with control for the false discovery rate and 4.15 without. The interval is not extended beyond the range of value for sexism, as these are not possible given the Likert scale (which starts at 1). In this example, the moderator is not experimentally manipulated, but it could be. More complicated mediation models could include interactions between treatment effects or moderators and covariates, with external variables, leading to moderated mediation. Interactions can be considered for pretty much any statistical model, but the usual assumptions need to hold for inference to be approximately valid.
:::{#exm-moderation2}
# Ghosting
@Leckfor:2023 consider the impact of ghosting someone (i.e., not responding to communications and cutting bridges with a person) on personal satisfaction, uncertainty and the need for closure (explanations). In their Study 3, they postulated that the type of rejection someone experienced. The study comprises 545 participants who were asked to reflect about a situation in their own life, in scenarios where
> Participants in the included (control) condition were asked to write about a time when someone “expressed that they wanted to continue and/or maintain a friendship, romantic relationship, or casual dating situation.” Participants in the ghosted condition wrote about a time when someone ended a relationship by “suddenly cutting off all communication without explanation.” Participants in the directly rejected condition wrote about a time when someone “directly communicate[d] that they wanted to end the relationship.” Nearly half (49%) of participants wrote about a friendship, followed by a romantic relationship (29%), and then a casual dating situation (22%).
The moderator variable is `cond`, one of control, being ghosted or rejected. This should moderate the effect of the need for closure on need for satisfaction. The authors postulated that individuals would have lower needs of satisfaction after being ghosted.
```{r}
#| label: moderation2
#| eval: false
#| echo: true
# Moderation analysis
# This time, the factor of interest is continuous
# and the moderator is categorical (K=3 levels)
data(LWSH23_S3, package = "hecedsm")
mod <- lm(data = LWSH23_S3, needsatis ~ needclosure * cond)
anova(mod) # interaction is significant
# Compute estimated marginal means, but with global weights equal to relative weight of each variable
emmeans::emmeans(mod, specs = "needclosure",
by = "cond",
at = list(needclosure = 2:7),
weights = "prop")
# All values are reported for the average of needclosure by default without the at
```
```{r}
#| label: tbl-lmmoder2
#| echo: false
#| eval: true
#| message: false
#| warning: false
#| tbl-cap: "Linear moderation model coefficients of Study 3 of @Leckfor:2023. Least square coefficients, standard errors, Wald tests and $p$-values for $\\beta=0$ and 95\\% confidence intervals."
data(LWSH23_S3, package = "hecedsm")
mod <- lm(data = LWSH23_S3, needsatis ~ needclosure * cond)
options(knitr.kable.NA = '')
tab_coefs <- broom::tidy(summary(mod))
tab_coefs$term = c("(Intercept)", "need for closure", "directly rejected [cond]", "ghosted [cond]",
"need for closure * directly rejected [cond]", "need for closure * ghosted [cond]")
# tab_coefs$p.value <- papaja::apa_p(tab_coefs$p.value)
tab_coefs$p.value<- NULL
tab_coefs$statistic <- NULL
tab_coefs$coef <- paste0("$\\beta_",0:5,"$")
tab_coefs$lower <- confint(mod)[,1]
tab_coefs$upper <- confint(mod)[,2]
tab_coefs <- tab_coefs |> dplyr::select(term, coef, estimate, std.error, lower, upper)
knitr::kable(tab_coefs,
booktabs = TRUE,
col.names = c("term", "coef.","estimate", "std. error", "lower CL", "upper CL"),
digits = c(0,2,1,2,2,2), align = "lccccc")
```
```{r}
#| label: tbl-testsmoder2
#| echo: false
#| eval: true
#| message: false
#| warning: false
#| tbl-cap: "Analysis of variance table for the linear moderation model of Study 3 of @Leckfor:2023."
data(LWSH23_S3, package = "hecedsm")
mod <- lm(data = LWSH23_S3, needsatis ~ needclosure * cond)
options(knitr.kable.NA = '')
tab_anova <- broom::tidy(anova(mod)) |> dplyr::select(!meansq,)
tab_anova$p.value <- papaja::apa_p(tab_anova$p.value)
knitr::kable(tab_anova,
booktabs = TRUE,
col.names = c("term", "df","sum of squares", "stat","p-value"),
digits = c(0,2,1,2,3,3))
```
Since the interaction is significant, we focus on the different slopes. We can of course simply report the equations of the linear regression for each group. Alternative, `emmeans` can also return the predictions for different values of `needclosure` per condition along with standard errors. The estimates will have lower values of `needclosure` when close to the subgroup averages.
The postulated theoretical mean equation is
\begin{align*}
\mathsf{E}(\texttt{needsatisf}) &= \beta_0 + \beta_1\texttt{needclosure} + \beta_2 \mathsf{1}_{\texttt{cond=directly rejected}} +\beta_3 \mathsf{1}_{\texttt{cond=directly rejected}} \\&+ \beta_4 \texttt{needclosure} \times \mathsf{1}_{\texttt{cond=directly rejected}} +\beta_5 \texttt{needclosure} \times\mathsf{1}_{\texttt{cond=directly rejected}}.
\end{align*}
The estimated model mean $\widehat{\mathsf{E}}(\texttt{needsatisf} \mid \texttt{needclosure}=x)$ when `needclosure` is worth $x \in [1,7]$ for each `cond` is thus
\begin{align*}
\widehat{Y} & = \begin{cases}
\widehat{\beta}_0 + \widehat{\beta}_1x, & \texttt{control};\\
(\widehat{\beta}_0 + \widehat{\beta}_3) + (\widehat{\beta}_1 + \beta_4)x , & \texttt{directly rejected};\\
(\widehat{\beta}_0 + \widehat{\beta}_2) + (\widehat{\beta}_1 + \beta_5)x, & \texttt{ghosted}.
\end{cases} \\&= \begin{cases}
5.015 + 0.181 x, & \texttt{control};\\
3.86 -0.222x, & \texttt{directly rejected};\\
3.52 -0.234x, & \texttt{ghosted}.
\end{cases}
\end{align*}
```{r}
#| eval: true
#| echo: false
#| message: false
#| warning: false
#| label: fig-moderation2-slopes
#| fig-cap: "Scatterplot and linear regression slopes for the need for satisfaction as a function of the moderating situation and the need for closure."
ggplot(data = LWSH23_S3,
mapping = aes(x = needclosure, y = needsatis, color = cond)) +
geom_smooth(method = "lm", se = FALSE) +
geom_point() + MetBrewer::scale_color_met_d("Hiroshige") +
labs(x = "need for closure",
y = "need for satisfaction",
color = NULL) +
theme_classic() +
theme(legend.position = "bottom")
```
We can see clearly in @fig-moderation2-slopes that the impact of the need for closure depends on the situation, with downward trends for the ghosted and rejected groups.
We can fit this toy model also with the PROCESS macro, which only understand numeric values for factors... The output (omitted) includes model coefficients with confidence intervals, the $F$ test statistic for the ANOVA, and output for the average need for closure.
```{r}
#| echo: true
#| eval: false
process(data = LWSH23_S3 |>
dplyr::mutate(condind = as.integer(cond)), # cast factor to numeric integer levels
y = "needsatis", # response variable
x = "needclosure", # explanatory variable (not manipulated)
w = "condind", # postulated moderator
mcw = 1, # dummy coding for moderator w (so compare to base level, here 'included')
model = 1) # number of model in Hayes (simple moderation)
```
:::
:::{.callout-important}
## **Summary**
* We can treat factor levels measuring quantities as continuous covariates, and reduce the number of parameters.
* Inclusion of continuous covariates may help filtering out unwanted variability.
* These are typically concomitant variables (measured alongside the response variable).
* This designs reduce the residual error, leading to an increase in power (more ability to detect differences in average between experimental conditions).
* We are only interested in differences due to experimental condition (marginal effects).
* In general, there should be no interaction between covariates/blocking factors and experimental conditions.
* This hypothesis can be assessed by comparing the models with and without interaction, if there are enough units (e.g., equality of slope for ANCOVA).
* Moderators are variables that interact with the experimental factor. We assess their presence by testing for an interaction in a linear regression model.
* In the presence of a moderator, we need to consider slopes or groups separately for subgroups.
:::