-
Notifications
You must be signed in to change notification settings - Fork 1
/
01-item-response-model.Rmd
275 lines (227 loc) · 8.4 KB
/
01-item-response-model.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
---
title: "Fit the item-response model"
output: rmarkdown::github_document
author: Tristan Mahr
date: "`r Sys.Date()`"
references:
- id: GelmanHill
type: book
author:
- family: Gelman
given: Andrew
- family: Hill
given: Jennifer
issued:
- year: '2007'
title: Data analysis using regression and multilevel/hierarchical models
publisher: Cambridge University Press
publisher-place: New York
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
fig.path = "figs/01-",
echo = TRUE,
comment = "#>",
collapse = TRUE,
fig.width = 6,
fig.asp = 0.62)
```
In this report, I fit a mixed effects model to perform an item-response
analysis. What I want is an estimate of each child's phonemic
discrimination ability.
Model intuition
------------------------------------------------------------------------
In a mixed effects model, we have a sample of units, each of which have
their own "effect" we want to estimate. For example, these might be
participants with multiple trials or responses in an experiment. We
could estimate each participant's effect separately, but this leaves
some information on the table. If we model 80 participants, we have a
good deal of information about how the 81st participant's data might
look. Mixed effects models estimate an average effect and the
distribution of effects around the average so that the units can borrow
information from each other.
To do an item-response analysis with mixed effects models, we include
two levels of effects: participants and items. Participants differ in
*ability*, and items differ in *difficulty*. @GelmanHill provide a
nice summary and visualization of the idea:
```{r, echo = FALSE, out.width = "50%"}
knitr::include_graphics("./figs/gelman-hill-excerpt.png")
```
We capture these ability and difficulty effects in a mixed effects model
by using by-participant and by-item random intercepts.
Plot empirical item-level effects
------------------------------------------------------------------------
Some of the trials, as defined by word pairs, are harder than others.
```{r sorted-empirical-pairs}
library(dplyr)
library(lme4)
library(ggplot2)
minpair <- readr::read_csv("./data/raw-minimal-pairs-responses.csv")
# Calculate the proportion correct in each pair so that we can sort
# the WordPair factor using that value
minpair %>%
group_by(WordPair) %>%
mutate(PairProp = mean(Correct)) %>%
ungroup() %>%
mutate(WordPair = forcats::fct_reorder(WordPair, -PairProp)) %>%
ggplot() +
aes(x = WordPair, y = Correct) +
stat_summary(fun.data = mean_se) +
coord_flip() +
labs(
x = NULL,
y = "Proportion Correct (Mean ± SE)")
```
But the pairs just reflect the words in them, and some of the words are
harder than others. But the words are nested in word-pairs with some of
the words repeated in different pairs.
```{r words-in-word-pairs, fig.width = 3, fig.height = 6, fig.asp = NA}
# Create
repeated <- c("juice", "keys", "moose", "mouse")
custom_sorted_pairs <- c(
"goose-juice", "juice-moose", "moose-mouse", "mouse-mouth",
"cheese-keys","keys-peas", "big-pig", "hen-pen", "bee-key",
"star-store", "sick-sit", "horse-house", "sleep-sweep",
"car-jar", "cold-hold")
minpair %>%
mutate(
group = ifelse(CorrectResponse %in% repeated, CorrectResponse, NA),
WordPair = factor(WordPair, custom_sorted_pairs)) %>%
ggplot() +
aes(x = CorrectResponse, y = Correct, color = group) +
stat_summary(fun.data = mean_se) +
coord_flip() +
facet_grid(WordPair ~ . , scales = "free_y") +
theme(strip.text.y = element_blank()) +
labs(
x = NULL,
y = "Proportion Correct (Mean ± SE)") +
scale_color_discrete(na.value = "black", guide = FALSE)
```
Our item response analysis therefore includes word-level and
word-in-pair-level random intercepts to estimate the difficulty of
individual words.
Fit the model
------------------------------------------------------------------------
```{r}
m_minpair <- glmer(
Correct ~ 1 + (1 | ResearchID) + (1 | CorrectResponse/WordPair),
data = minpair,
family = binomial(),
# Use a different optimizer bc default may not converge
control = glmerControl(optimizer = "bobyqa"))
summary(m_minpair)
```
Here are caterpillar plots of the effects and 95% intervals. The
participant plots are chunky presumably because participants could have
gotten the same scores.
```{r ranef-caterpillars, results = "hide", fig.show = "hold", out.width = "30%", fig.height = 8, fig.width = 4, fig.asp = NA}
lattice::dotplot(ranef(m_minpair, condVar = TRUE))
```
We can control for age too.
```{r}
m_minpair_age <- glmer(
Correct ~ scale(MinPair_Age) + (1 | ResearchID) + (1 | CorrectResponse/WordPair),
data = minpair,
family = binomial(),
# Use a different optimizer bc default may not converge
control = glmerControl(optimizer = "bobyqa"))
summary(m_minpair_age)
```
And receptive vocabulary.
```{r}
m_minpair_vocab <- glmer(
Correct ~ scale(MinPair_Age) + scale(PPVT_GSV) +
(1 | ResearchID) + (1 | CorrectResponse/WordPair),
data = minpair,
family = binomial(),
# Use a different optimizer bc default may not converge
control = glmerControl(optimizer = "bobyqa"))
summary(m_minpair_vocab)
```
Which makes the participant abilities a little smoother because we have
more information to differentiate participant's abilities.
```{r ranef-caterpillars-age, results = "hide", fig.show = "hold", out.width = "30%", fig.height = 8, fig.width = 4, fig.asp = NA}
p <- lattice::dotplot(
ranef(m_minpair_age, condVar = TRUE),
scales = list(y = list(alternating = 4)),
sub = "age model")
print(p[["ResearchID"]])
p <- lattice::dotplot(
ranef(m_minpair_vocab, condVar = TRUE),
scales = list(y = list(alternating = 4)),
sub = "age + receptive vocab model")
print(p[["ResearchID"]])
```
Now we package these estimates up and save.
```{r}
tidy_abilities <- function(model, label, newdata = minpair) {
# Get the abilities (average ability plus participant deviations from average)
coefs <- coef(model)[["ResearchID"]] %>%
as.data.frame() %>%
tibble::rownames_to_column("ResearchID") %>%
select(ResearchID, coef = `(Intercept)`)
# Get the abilities (participant deviations)
ranefs <- ranef(model)[["ResearchID"]] %>%
as.data.frame() %>%
tibble::rownames_to_column("ResearchID") %>%
select(ResearchID, ranef = `(Intercept)`)
# Get the predictions for each participant on an average item
newdata$fitted <- predict(
model, newdata, re.form = ~ (1 | ResearchID), allow.new.levels = TRUE)
fitted <- newdata %>% select(ResearchID, fitted) %>% distinct()
coefs %>%
left_join(ranefs, by = "ResearchID") %>%
left_join(fitted, by = "ResearchID") %>%
tibble::add_column(Model = label, .before = 1)
}
# Include participant info
p_info <- minpair %>%
group_by(ResearchID) %>%
summarise(
PPVT_GSV = unique(PPVT_GSV),
MinPair_Age = unique(MinPair_Age),
Correct = sum(Correct),
Trials = n(),
Proportion = Correct / Trials)
fits <- bind_rows(
tidy_abilities(m_minpair, "base"),
tidy_abilities(m_minpair_age, "base + age"),
tidy_abilities(m_minpair_vocab, "base + age + ppvt")) %>%
left_join(p_info, by = "ResearchID") %>%
readr::write_csv("./data/minpair-abilities.csv")
```
Some confirmatory plots
------------------------------------------------------------------------
First, confirm that participants missing vocabulary scores did not get an
ability estimate from the model that controlled for vocabulary scores.
There should be fewer rows of estimates in that model.
```{r}
count(fits, Model)
```
Now, we plot the participant's intercepts. These are the accuracy for an
average participant for an average item (fixed-effects intercept) plus
each participant's deviation from that average (by-participant random
intercept).
```{r ranef-controlled, results = "hide"}
ggplot(fits) +
aes(x = MinPair_Age, y = coef) +
geom_point() +
stat_smooth(method = "lm") +
facet_wrap("Model") +
labs(
title = "Age effect on participant intercepts",
x = "Age (months)",
y = "Participant average (log-odds)")
ggplot(fits) +
aes(x = PPVT_GSV, y = coef) +
geom_point(na.rm = TRUE) +
stat_smooth(method = "lm", na.rm = TRUE) +
facet_wrap("Model") +
labs(
title = "Vocabulary effect on participant intercepts",
x = "Vocabulary (PPVT4 GSV)",
y = "Participant average (log-odds)")
```
References
------------------------------------------------------------------------