-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path05-appendix.qmd
215 lines (158 loc) · 6.87 KB
/
05-appendix.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
# Appendix {.unnumbered}
```{r}
#| label: load-pkg
library(tidyverse)
library(tidymodels)
library(knitr)
library(reshape)
library(ggplot2)
library(dplyr)
library(lme4)
library(MASS)
library(car)
library(MuMIn)
library(lmerTest)
library(ggplot2)
library(dplyr)
library(broom.mixed)
library(forcats)
library(sjPlot)
theme_set(theme_minimal(base_size = 11))
```
```{r}
#| label: load-data
#| message: false
gardnerjanson <- read_csv(here::here("processed-data/gardnerjanson.csv"))
gardnerjanson_museums <- read_csv(here::here("processed-data/gardnerjanson_museums.csv"))
```
```{r}
#| label: lmm_prep
gardnerjanson_museums_mod <- gardnerjanson_museums %>%
filter(!startsWith(artist_name, "N/A")) %>%
mutate(artist_race_nwi = if_else(artist_race == "White", "White", "Non-White"))
gardnerjanson_museums_mod <- gardnerjanson_museums_mod %>%
mutate(artist_nationality_other = factor(artist_nationality_other,
levels = c("American", "French", "Other", "British", "German", "Spanish")
))
```
```{r}
#| label: lmm_full
lmm_full <- lmer(log(space_ratio_per_page_total) ~ artist_race_nwi
+ artist_ethnicity
+ artist_gender
+ artist_nationality_other
+ moma_count_to_year
+ whitney_count_to_year +
(1 | artist_name),
data = gardnerjanson_museums_mod,
REML = FALSE)
```
```{r}
#| label: lmm_test
lmm <- lmer(log(space_ratio_per_page_total) ~ artist_race_nwi
+ artist_ethnicity
+ artist_gender
+ artist_nationality_other
+ moma_count_to_year
+ whitney_count_to_year
+ artist_nationality_other*moma_count_to_year
+ artist_race_nwi*moma_count_to_year
+ artist_ethnicity*moma_count_to_year
+ artist_race_nwi*whitney_count_to_year
+ artist_ethnicity*whitney_count_to_year
+ (1 | artist_name),
data = gardnerjanson_museums_mod,
REML = FALSE)
```
```{r}
#| label: lmm_step
final_model <- lmerTest::step(lmm)
```
```{r}
#| label: lmm
lmm <- lmer(log(space_ratio_per_page_total) ~ artist_nationality_other
+ moma_count_to_year
+ artist_nationality_other*moma_count_to_year
+ (1 | artist_name),
data = gardnerjanson_museums_mod,
REML = FALSE)
```
## Data Dictionary
Outcome:
`space_ratio_per_page_total` = The area in centimeters squared of both the text and the figure of a particular artist in a given edition of *Janson's History of Art* or *Gardner's Art Through the Ages* divided by the area in centimeters squared of a single page of the respective edition.
Potential Predictors:
`artist_gender` = The gender of the artist.
`artist_race` = The race of the artist.
`artist_race_nwi` = The non-white indicator for artist race, meaning if an artist's race is denoted as either white or non-white.
`artist_ethnicity` = The ethnicity of the artist.
`artist_nationality_other` = The nationality of the artist. Of the total count of artists through all editions of *Gardner's Art Through the Ages* and *Janson's History of Art*, 77.32% account for French, Spanish, British, American and German. Therefore, the categorical strings of this variable are French, Spanish, British, American, German and Other.
`whitney_count_to_year` = The count of exhibitions held by The Whitney of a particular artist at a particular moment of time, as highlighted by `year`.
`moma_count_to_year` = The count of exhibitions held by the Museum of Modern Art (MoMA) of a particular artist at a particular moment of time, as highlighted by `year`.
`year` = The year of publication for a given edition of Janson or Gardner.
Other variables:
`edition_number` = The edition number of the textbook from either Janson's History of Art or Gardner's Art Through the Ages.
`book` = Which book, either Janson or Gardner the particular artist at that particular time was included.
`artist_unique_id` = A unique identifying number assigned to artists across books and editions denoted in alphabetical order.
`artist_name` = The name of the artist.
## Log Transformation: Total Space Ratio per Page
```{r}
#| label: fig-logspaceratioperpagetotal
#| fig-cap: "Distribution of the Log of Total Space Ratio per Page, n = 3162, (Source: All artists of two-dimensional works made after c. 1750 in Gardner's Art Through the Ages and Janson's History of Art, 1926-2020)."
ggplot(
gardnerjanson_museums,
aes(x = log(space_ratio_per_page_total))
) +
geom_histogram(binwidth = .1) +
labs(
title = "Distribution of the Log of Ratio of Space
per Artist per Edition per Page in all Textbooks",
x = "Ratio of Space per Page Total",
y = "Count"
)
```
In order to create less skew in our outcome variable, it is evident that log transforming total space ratio per page given to an artist in a particular edition gives the spread a much more mild right-skew than before. The shape is still unimodal and asymmetrical. I will be using the log transformation on the total space ratio per page, our outcome variable, in the linear mixed-effects model. This allows for the residuals to have constant variance.
\pagebreak
## Assumptions
### Residuals and Constant Variance
```{r}
#| label: fig-residuals
#| fig-cap: "Constant Variance Assumption of the Residuals"
plot(resid(lmm, type = "pearson") ~ fitted(lmm),
main = "Constant Variance Assumption Satisfied",
xlab = "Fitted",
ylab = "Residuals")
```
The very low values of the predicted values have very low variability for residuals but for the bulk of the data, there is constant variability in the residuals.
\pagebreak
### Normality
```{r}
#| label: fig-density-histogram
#| fig-cap: "Normality Assumption of the Residuals"
lmm_aug <- broom.mixed::augment(lmm, gardnerjanson_museums_mod)
ggplot(lmm_aug, aes(x = .resid)) +
geom_histogram(binwidth = .05, aes(y = after_stat(density))) +
theme_minimal() +
stat_function(
fun = dnorm,
args = list(
mean = mean(lmm_aug$.resid),
sd = sd(lmm_aug$.resid)
),
color = "red",
size = 2
)+
labs(title = "Normality Assumption Satisfied",
x = "Residuals",
y = "Density")
```
The distribution of the residuals is approximately normal.
\pagebreak
### Independence
Since we cannot assume that each row in our data set is independent from another, because we expect an artist effect, we added a random effect at the artist level. Between artists, we expect the observations to be independent.
### Collinearity
```{r}
#| label: collinearity
#| tbl-cap: "Collinearity Test"
vif(lmm)
```
Given artist nationality, MoMA Count to Year and the interaction between the two all have a corrected GVIF under 10 therefore no variables are collinear.