-
Notifications
You must be signed in to change notification settings - Fork 0
/
inferences.qmd
209 lines (128 loc) · 7.59 KB
/
inferences.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
# Infer
# Moving into analyses
> thoughts from Jenny : I wonder whether it would be useful to structure this in terms of lets combine skills from previous chapters to..
1. ask a question,
2. get plots/descriptives
3. then answer it with inferentials?
4. And repeat that process for t-test, linear regression and anova?
## Plots and descriptives
Now, let's test whether this scale varies by condition (condition 1 vs. condition 2 in condition12).
Before jumping into t-tests, let's first visualise the data (scale1_index) to get a sense of what it looks like.
The first command here tells R to treat condition12 as a categorical variable that groups the data (technically, a factor). This is helpful for plotting. The code uses the `dataframe$variablename` structure to do a command on just one variable.
The second command pipes the data into a command to create a boxplot (`geom_boxplot`) with the datapoints plotted as well (`geom_jitter`).The `width` command makes the plotting of the dots a bit narrower to help with interpretation.
```{r eval = FALSE}
data_scalescomputed$condition12 <- as.factor(data_scalescomputed$condition12)
data_scalescomputed %>%
ggplot(aes(x = condition12, y = scale1_index, fill = condition12)) +
geom_boxplot(alpha = .5) +
geom_jitter(alpha = .5, width = 0.2)
```
Based on the boxplot, we'd expect a t-test to be nonsignificant - the wide boxes of the plots for each condition overlap, even though the mean of condition 2 is slightly higher than condition 1 (the horizontal line through the box).
## t-test
The first line of code uses the base R command to run a t-test. It tells R to do the test on the `data_scalescomputed` data frame, asking whether `scale1_index` varies by (\~) condition. `var.` sets the assumption of variance and `conf.level` sets the desired alpha of the test. That command is wrapped in the handy `t_apa` function, which makes the output much easier to read (and pull directly into a write-up!), and also allows us to get a confidence interval on the effect size (`es_ci`).
For reporting, you'd probably also want to know the means and standard deviations by condition. The next bit of code asks R to do this, using the `summarise` function, separately for each condition. Reminder to use `na.rm` here to account for any missing values.
The `gt()` command shows the requested dataframe in a nice formatting.
```{r eval = FALSE}
t_apa(t.test(data = data_scalescomputed, scale1_index ~ condition12, var. = TRUE, conf.level = .95), es_ci = TRUE)
scale1_by_condition12 <- data_scalescomputed %>%
group_by(condition12) %>%
summarise(mean_scale1 = mean(scale1_index, na.rm = TRUE),
sd_scale1 = sd(scale1_index, na.rm = TRUE))
gt(scale1_by_condition12)
```
# Linear regression
*lw_addnarrativehere*
- outcome: variable_6
- step 1: demographics_categ
- step 2: variable_4
- step 3: scale1_index
this asks the lm package to use predictor to predict outcome
lm(outcome \~ predictor )
```{r eval = FALSE}
regression1 <- lm(data = data_scalescomputed, variable6 ~ demographicscateg + variable4 + scale1_index)
summary(regression1)
```
*lw_addnarrativehere* simulates hierarchical stepwise regression?
```{r eval = FALSE}
regressionlm1 <- lm(data = data_scalescomputed, variable6 ~ demographicscateg)
regressionlm2 <- lm(data = data_scalescomputed, variable6 ~ demographicscateg + variable4)
regressionlm3 <- lm(data = data_scalescomputed, variable6 ~ demographicscateg + variable4 + scale1_index)
summary(regressionlm1)
summary(regressionlm2)
summary(regressionlm3)
apa.reg.table(regressionlm1, regressionlm2, regressionlm3, filename = here("output_files","RegressionTable3_APA.doc"), table.number = 3)
report(regressionlm3)
```
# Analysis of Variance (ANOVA)
*lw add narrative*
4 (between participants - condition1234) x2 (between participants - individual difference categorical variable - demographicscateg) design
outcome: scale1_index
```{r eval = FALSE}
## first tell R that condition and demographicscateg are categorical (factors)
data_scalescomputed$condition1234 <- as.factor(data_scalescomputed$condition1234)
data_scalescomputed$demographicscateg <- as.factor(data_scalescomputed$demographicscateg)
## this creates a dataframe that contains the summary statistics (means, standard deviations, etc.)
scale1_stats <- ezStats(data = data_scalescomputed,
dv = scale1_index,
wid = participantid,
between = c("condition1234","demographicscateg"))
print(scale1_stats)
## this carries out an anova with condition1234 and demographicscateg as between-participants factors
## the output includes statistical tests of the main effect of condition1234, the main effect of demographicscateg, and the interaction between the two
scale1_anova <- ezANOVA(data = data_scalescomputed,
dv = scale1_index,
wid = participantid,
between = c("condition1234","demographicscateg"),
return_aov = TRUE)
print(scale1_anova)
## NOTE THAT THE FOLLOWING WOULD BE DONE TO FURTHER INVESTIGATE AN INTERACTION OR A DIFFERENCE IN A MULTI-CATEGORICAL VARIABLE.
## to build the contrasts we want to do (e.g., baseline vs. midpoint for PhD), we need to ask emmeans to create output to see what the rows represent.
scale1_emm <- emmeans(scale1_anova$aov, ~ condition1234 * demographicscateg)
scale1_emm
## now we can create a set of named vectors that represent each of the 8 means, based on the position in the 8-item list from emmeans.
cond1categ1 = c(1,0,0,0,0,0,0,0)
cond2categ1 = c(0,1,0,0,0,0,0,0)
cond3categ1 = c(0,0,1,0,0,0,0,0)
cond4categ1 = c(0,0,0,1,0,0,0,0)
cond1categ2 = c(0,0,0,0,1,0,0,0)
cond2categ2 = c(0,0,0,0,0,1,0,0)
cond3categ2 = c(0,0,0,0,0,0,1,0)
cond4categ2 = c(0,0,0,0,0,0,0,1)
cond1 = c(1,0,0,0,1,0,0,0)
cond2 = c(0,1,0,0,0,1,0,0)
cond3 = c(0,0,1,0,0,0,1,0)
cond4 = c(0,0,0,1,0,0,0,1)
## now we can ask for contrasts based on these vectors; we will explore the main effect of condition1234 by comparing each of the conditions to one another
scale1_contrasts <- contrast(scale1_emm, method = list("cond1 - cond2" = cond1 - cond2,
"cond1 - cond3" = cond1 - cond3,
"cond1 - cond4" = cond1 - cond4,
"cond2 - cond3" = cond2 - cond3,
"cond2 - cond4" = cond2 - cond4,
"cond3 - cond4" = cond3 - cond4))
scale1_contrasts
## here's a version with confidence intervals instead of t and p values
scale1_contrastsCI <- scale1_contrasts %>%
confint()
scale1_contrastsCI
#you may have a specific planned contrast to run. Here let's imagine you want to compare condition 2 to all of the other 3 conditions in one comparison.
scale1_plannedcontraststats <- ezStats(data = data_scalescomputed,
dv = scale1_index,
wid = participantid,
between = c("condition1234"))
scale1_plannedcontraststats
cond2v134 = c(0,1,0,0,0,1,0,0)
cond134v2 = c(1,0,1,1,1,0,1,1)
scale1_plannedcontrasts <- contrast(scale1_emm, method = list("condition 1 vs all the rest" = cond2v134 - cond134v2))
scale1_plannedcontrasts
```
Let's make a figure!
```{r eval = FALSE}
data_scalescomputed %>%
ggplot(aes(x = condition1234, y = scale1_index, fill = demographicscateg)) +
geom_boxplot() +
theme_light()
data_scalescomputed %>%
ggplot(aes(x = condition1234, y = scale1_index)) +
geom_boxplot() +
theme_light()
```