-
Notifications
You must be signed in to change notification settings - Fork 1
/
4_exp.qmd
1208 lines (1010 loc) · 75.6 KB
/
4_exp.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Experiment 4"
subtitle: "**Online Processing**"
toc-title: "Experiment 4: Online Processing"
---
```{r}
#| label: exp4-setup
#| include: false
library(tidyverse) # data wrangling
library(magrittr)
library(sjmisc)
options(dplyr.group.inform = FALSE, dplyr.summarise.inform = FALSE)
library(lme4) # stats
library(lmerTest)
library(buildmer)
library(snow) # parallel
library(insight) # model results
library(broom.mixed)
library(sjPlot) # tables
library(flextable)
library(patchwork) # plots
library(RColorBrewer)
library(ggtext)
rainbow <- read.csv("resources/formatting/rainbow.csv") # colors
rainbow_primary <- rainbow %>%
filter(Spectral != "") %>%
select(-Score) %>%
column_to_rownames("Spectral")
source("resources/data-functions/exp4_load_data.R") # setting up data
source("resources/formatting/printing.R") # model results in text
source("resources/formatting/aesthetics.R") # plot and table themes
source("resources/data-functions/demographics.R") # demographics tables
```
[![](resources/icons/preregistered.svg){title="Preregistration" width="30"}](https://osf.io/r3fy9) [![](resources/icons/open-materials.svg){title="Materials" width="30"}](https://github.com/bethanyhgardner/dissertation/blob/main/materials/exp4) [![](resources/icons/open-data.svg){title="Data" width="30"}](https://github.com/bethanyhgardner/dissertation/blob/main/data) [![](resources/icons/file-code-fill.svg){title="Analysis Code" width="30"}](https://github.com/bethanyhgardner/dissertation/blob/main/4_exp.qmd)
<br>
## Motivation
One of the most common objections to singular *they* is that both [generic](0_introduction.qmd#def-generic "generic singular they") and [specific](0_introduction.qmd#def-specific "specific singular they") forms are too ambiguous [@hekanaho2020]. However, it is unclear whether this opinion arises from actual difficulties in [coreference resolution](0_introduction.qmd#def-coreference "coreference"), or if it is more a product of language and gender attitudes. According to the processing fluency account [@alter2009], processing difficulty and language attitudes are connected both directly and indirectly. First, processing fluency can be a cue to language attitudes: if a listener attributes their difficulty understanding a speaker to that speaker being unable or unwilling to communicate in a way that the listener finds clear, experiencing less processing fluency will cause them to evaluate the speaker more negatively. Second, harder processing tends to elicit more negative affect, which may bias listeners' language attitudes [@dragojevic2020]. One line of experiments has connected the processing fluency account specifically to perceptions of nonnative-accented speech. Participants listened to audio recordings of fictional stories, and while their task was ostensibly to remember enough of the story to complete a fill-in-the-blanks memory task, the dependent measures were how much processing fluency they experienced (e.g., rating as clear, easy to understand), how positively they felt about the speaker (1--100 scale), and status (e.g., intelligence, competence) and solidarity (e.g., friendliness, niceness) judgments about the speaker. The experiments manipulated various ways of making the audio easier or harder to understand, independent of the speaker's accent. Adding background white noise to the audio decreased listeners' fluency ratings when the speaker used a Punjabi accent, more so than when the same speaker used a Standard American English accent. The lower processing fluency then resulted in more negative affect and lower status attributions to the speaker. When a Mandarin-accented speaker was accompanied with subtitles or participants had read a transcript of the story first, participants reported higher processing fluency, which resulted in more positive feelings about and higher status attributions to the speaker. In both sets of experiments, the effects of making the listening conditions easier or harder on status attributions were mediated by processing fluency and sequentially by fluency and affect [@dragojevic2020; @dragojevic2016; @dragojevic2017].
The processing fluency account would predict that dislike of singular *they* is caused, at least in part, by lower-level processing difficulty. Multiple factors could cause listeners to experience lower processing fluency for singular *they* compared to other pronouns: a larger set of possible [antecedents](0_introduction.qmd#def-antecedent "antecedent") may make it more ambiguous, it may elicit a number or gender mismatch agreement violation, it may be newly learned, and it is overall less frequent even for speakers familiar with it. The processing fluency account also predicts that making singular *they* easier to understand would reduce people's negative reactions to it, and therefore the people who use it.
However, the actual amount of processing difficulty for singular *they*---particularly for [definite specific gender-specified](0_introduction.qmd#def-gender-specified "gender-specified singular they") forms---is unclear. Only a few studies to date have investigated [online comprehension](0_introduction.qmd#def-online "online processing") of *they* coreferring with proper names. These are described in more detail in the [Section 0.4.4](#names), but to review briefly, people are slower to identify the referent for *they* compared to *he* and *she*, as measured through a [maze task](0_introduction.qmd#def-maze "maze task") while reading [@shenkar2023] and a mouse tracking task while listening [@arnold2023]. In two [ERP](0_introduction.qmd#def-ERP "ERP measures") experiments, a [P600 effect](0_introduction.qmd#def-P600 "P600 effects") was observed for *they* coreferring with proper names (gender-specified), but not with specific gender-unspecified referents (e.g., *the participant*) [@chen2023; @prasad2020]. Since the P600 indexes detecting a syntactic error or having difficulty comprehending a sentence's syntactic structure [@hagoort1993; @kaan2000; @osterhout1994; @osterhout1992], Prasad & Morris interpret their results to indicate that *they* coreferring with proper names still causes a gender agreement error, even though participants in their experiment all had significant experience with using they/them pronouns and considered it grammatical in offline acceptability judgments.
However, results like @prasad2020 do not necessarily require that *they* for specific gender-specified antecedents is still ungrammatical for these participants. Even in LGBTQ+ communities, *they* coreferring with a name is still relatively infrequent overall and would not be expected in many contexts. Stimuli in sentence processing experiments are typically unrelated, with each sentence using a different name and referring to a new character. When singular *they* corefers with a new referent in each trial, it is unclear whether *they* is consistently perceived as syntactically anomalous, or if it is originally unexpected, but could be processed smoothly once anticipated to corefer with a particular referent. Experiment 4 tests processing in the context of repeated reference, where listeners can come to expect singular *they* to corefer with certain characters. This is potentially somewhat easier, and it more closely resembles the real-world contexts in which we hear pronouns referring to people.
Additionally, the majority of processing studies, particularly for [generic indefinite](0_introduction.qmd#def-generic-indefinite "generic indefinite singular they") *they*, have used [self-paced reading](0_introduction.qmd#def-SPR "self-paced reading tasks") and [eyetracking while reading](0_introduction.qmd#def-eyetracking-reading "eyetracking while reading tasks") measures. Experiment 4 is one of the first to use the [visual world paradigm]{#def-VWP .link-primary title="Definition: visual world paradigm"}, which measures eye movements while participants listen to sentences describing a visual scene. Gaze at pictured characters provides a measure of online processing as the sentence unfolds, since listeners automatically look at what they think is being talked about [@allopenna1998; @sedivy1999; @spivey2002; @tanenhaus1995; @tanenhaus2000]. The visual world paradigm has advantages compared to other tasks, as it provides detailed time-course information about *which* alternative interpretations are being considered, in addition to *when* processing difficulties occur.
Experiment 4 investigates the degree of processing difficulty of singular *they* compared to *he* and *she*, if the processing of singular *they* follows the same patterns as *he* and *she*, and if processing measures correspond with offline judgments. The design is based on a prior line of work investigating ambiguous pronoun resolution. In Arnold et al. [-@arnold2000; -@arnold2007], participants looked at illustrated scenes of cartoon characters and listened to stories about them:
> [1]{#stim-arnold-1} Donald is bringing some mail to Mickey/Minnie, while a violent storm is beginning\
> [2]{#stim-arnold-2} He's/She's carrying an\
> [3]{#stim-arnold-3} umbrella and it looks like they're both going to need it.
Part [1](#stim-arnold-1) introduced 2 named characters (*Donald, Mickey/Minnie*), using a verb (*bringing*) that allows for a subsequent pronoun to refer to either of the characters individually [@garnham2001; @gordon1993; @sanford1981]. While *he* or *she* (part [2](#stim-arnold-2)) is more likely to refer to the character mentioned first in the prior sentence (*Donald*) [@arnold2000; @arnold2007; @gernsbacher1989; @kaiser2011], it can also refer to the character mentioned second (*Mickey/Minnie*). In other words, the character mentioned first is more accessible [@ariel2006]. This structure makes it possible for the referent of *he* or *she*---called the [target]{#def-VWP-target .link-primary title="definition: VWP target"} character in visual world experiments---to remain ambiguous until the next phrase (part [3](#stim-arnold-3)) can be compared to the illustration. In this example, either Donald or Mickey/Minnie is carrying an umbrella ([Figure @fig-exp4-arnold2000]A). This allows for enough time to observe processing of the pronoun (*is carrying an*), but without creating a discourse context too different from actual language use.
The stories in @arnold2000 manipulated 2 factors: the ambiguity of the pronoun (target and [competitor]{#def-VWP-competitor .link-primary title="definition: VWP competitor"} characters using the same vs different pronouns) and the accessibility of the referent (target mentioned first vs second). The results showed that listeners rapidly use both gender and accessibility cues to identify which character the pronoun referred to ([Figure @fig-exp4-arnold2000]B). If gender was unambiguous (top right in [Figure @fig-exp4-arnold2000]A), the pronoun referred to the character mentioned first (bottom left), or both (top left), participants looked at the target character starting at approximately 200ms after the pronoun. This is about as quickly as effects in the visual world paradigm can be observed [@hallett1986; @tanenhaus1995]. When neither gender nor accessibility cues disambiguated the referent (bottom right), participants looked at the target and competitor characters almost equally. For the purposes of the present experiment, these results provide a validated stimuli design and a baseline for how we expect *he* and *she* to be processed.
![@arnold2000. \[A\] Recreation of design, showing the pronoun ambiguity and order of mention conditions. The original materials were illustrated using the Disney characters. \[B\] Results, with 0 indicating pronoun onset and horizontal lines indicating the verb (*carrying*).](materials/exp4/figures/arnold2000.png){#fig-exp4-arnold2000 width="80%"}
A later set of studies used a similar design to examine how listeners process pronouns acoustically ambiguous between *he* and *she* [@brown-schmidt2017; @falandays2020]. This experiment used similar stories as Arnold et al. [-@arnold2000; -@arnold2007], but different images. Instead of 2 characters being drawn to match the scene, characters were pictured in colored shapes. The target character was disambiguated by describing their location, e.g., *he's standing on a blue square* instead of *he's carrying an umbrella*. This allows for a larger range of stimuli to be created, and the prior results demonstrate that listeners can process *he* and *she* smoothly in these types of stories. Critically, while the descriptions may seem odd and somewhat discontinuous, the discourse structure matches how speakers introduce new referents and when they tend to use pronouns instead of names.
The current experiment uses similar manipulations as Arnold et al. [-@arnold2000; -@arnold2007] and the same task as @brown-schmidt2017. One potential issue with this design is that *they* can be ambiguous between a singular and plural interpretation, even if participants learn that *they* is always singular in the context of the experiment. An alternative is to use stimuli that rule out a plural interpretation of *they*. Reflexive pronouns (*himself, herself, themself*) can syntactically constrain a singular interpretation [e.g., @runner2006; @sturt2003], but introduce a potential confound, since speakers vary in whether they prefer *themself* or *themselves* for singular referents [@ahn2022]. Another option is to use stimuli that semantically rule out a plural interpretation. Returning to some of the examples in the first chapter ([Section 0.2.3](#they-forms)), *they're worrying* ([@exm-atlantic]) can be singular or plural, since people can worry together, but *their free leg* ([@exm-tma2]) can only be singular, since a body part only belongs to one person. However, it is difficult to create stimuli that rule out a plural interpretation, while still including a long enough period where the pronoun is ambiguous between two possible referents. Results like these would be difficult to interpret because it would be unclear whether processing costs are due to singular *they* itself, or because the structure of the story does not match when speakers use pronouns instead of names or other referring expressions. Moreover, because fully ruling out a plural interpretation is difficult, the majority of instances of singular *they* in actual language use *do* contain some degree of ambiguity between singular and plural interpretations. Results from stimuli that reflect a very narrow set of contexts in which people hear singular *they*, where no plural interpretation is at all possible, would be less relevant.
## Methods
The design and analysis plan were [preregistered](https://osf.io/r3fy9 "Experiment 4 Preregistration") on the Open Science Framework. Sources and attributions for the images are included with the [materials](https://github.com/bethanyhgardner/dissertation/tree/main/materials/exp4 "Experiment 4 Materials"); the edited images and audio stimuli are available upon request. The de-identified [data](https://github.com/bethanyhgardner/dissertation/blob/main/data "Experiment 4 Data") and [analysis code](https://github.com/bethanyhgardner/dissertation/blob/main/exp4.qmd "Source Code") are available at this dissertation's [Github repository](https://github.com/bethanyhgardner/dissertation "Github repository").
### Participants
```{r}
#| label: exp4-participants-data
# demographic counts
exp4_d_demographics <- read.csv(
"data/exp4_demographics.csv",
stringsAsFactors = TRUE
)
# subset of responses
exp4_d_survey <- read.csv("data/exp4_survey.csv", stringsAsFactors = TRUE)
# n
exp4_n <- exp4_d_demographics %>%
filter(Category == "Age" & Group == "Total") %>%
pull(Total)
```
`r exp4_n` participants completed the study for partial course credit or for pay; their demographic information is shown in @tbl-exp4-demographics. Participants were required to be fluent English speakers (but not necessarily native or monolingual) and to have normal or corrected-to-normal vision and hearing, and most were Vanderbilt undergraduate students. An additional 2 participants completed the experiment, but were excluded due to too few trials having usable eyetracking data. The experiment lasted approximately 45 minutes.
### Materials
#### Characters
Participants learned about 6 characters, each associated with a name and an image: 2 who used he/him, 2 who used she/her, and 2 who used they/them ([Figure @fig-exp4-stimuli]A). The 6 character names and 6 character [images](https://github.com/bethanyhgardner/dissertation/blob/main/materials/exp4/images.md "Experiment 4 Images") were the same as in Experiment 3 [@drucker2019]. Recall that all names were gender neutral since counterbalancing gender associations of the names within lists was not feasible. Participants were randomly assigned to 1 of 6 lists, in order to counterbalance the images and names associated with characters who use they/them. Across lists, 3 images appeared twice with he/him and once with they/them, and 3 images appeared twice with she/her and once with they/them; each name appeared twice with each pronoun. Critically, across lists they/them appeared once with each image and once with each name, in order to avoid confounding interpretations about what aspects of a person's name or appearance may make it easier for someone to learn that they use they/them pronouns.
![Experiment 4: Stimuli. \[A\] Example set of characters. \[B\] Example trial screen and story, with grey boxes indicating information not shown to participants.](materials/exp4/figures/stimuli.png){#fig-exp4-stimuli width="750"}
#### Stories
```{r}
#| label: exp4-audio-times
exp4_audio_times <- read.csv("materials/exp4/audio-times.csv") %>%
filter(Type == "Pronoun") %>%
summarise(
min = min(Time_Shape),
max = max(Time_Shape),
mean = mean(Time_Shape),
sd = sd(Time_Shape)
) %>%
round(0)
exp4_audio_times
```
The stories and visual scenes were based on @arnold2000 @arnold2007 and @brown-schmidt2017. During each trial, the 6 characters were arranged in a 3x2 grid, each shown inside a colored shape (red, yellow, green, blue; triangle, square) ([Figure @fig-exp4-stimuli]B). Participants listened to stories in the frame:
> [1]{#stim-exp4-1} Jaime is painting a portrait of Sam, as some paint is spilling on the floor.\
> [2]{#stim-exp4-2} He is/she is/they are\
> [3]{#stim-exp4-3} standing in a blue triangle\
> [4]{#stim-exp4-4} and the painting looks amazing
Each story began with a sentence that named two characters, with an additional phrase to allow time for participants to identify them ([part 1](@stim-exp4-1)). The two named characters---the target and competitor---always used different pronouns (e.g., Jaime: they/them, Sam: he/him). This created 3 [Pronoun Pair conditions]{.fw-semibold}: they/them targets with he/him or she/her competitors [\[They\|HeShe\]]{.fw-semibold}, he/him or she/her targets with they/them competitors [\[HeShe\|They\]]{.fw-semibold}, and he/him or she/her targets with he/him or she/her competitors [\[HeShe\|SheHe\]]{.fw-semibold}. Next, a pronoun (*he*, *she*, or *they*) referred to one of the named characters ([part 2](@stim-exp4-2)). This created 2 [Order of Mention conditions]{.fw-semibold}, where the pronoun refers to the character mentioned [first]{.fw-semibold} in the preceding sentence or to the character mentioned [second]{.fw-semibold}. [Figure @fig-exp4-stimuli]B shows an example of the first-mention condition, where the pronoun (*they*) refers to the first named character (*Jaime*). The second-mention story matching this scene would have *he is standing in a blue triangle*, where *he* refers to Sam.
At this point in the story, participants could identify which of the named characters is the target if they knew the characters' pronouns and were using that information in their language comprehension (e.g., Jaime uses they/them and Sam uses he/him, meaning that *they* refers to Jaime). The stories then described the location of the target character ([part 3](@stim-exp4-3)). Because the target and competitor characters were always pictured with the same color, the target was not fully disambiguated until the shape word, an average of `r exp4_audio_times$mean`ms after the pronoun onset. After the shape word ([part 3](@stim-exp4-3)), listeners could identify the target character without taking the pronoun into consideration. The story concluded with a final phrase, which did not include another pronoun referring to the character(s) ([part 4](@stim-exp4-4)). After listening to each story, participants were asked to decide whether it matched the scene (e.g., if Jaime was standing in a blue square).
There were a total of 60 [story frames](https://github.com/bethanyhgardner/dissertation/blob/main/materials/exp4/stories.md "Experiment 4 Stories") ([1](@stim-exp4-1) + [4](@stim-exp4-4)). Within lists, each story appeared once in the first-mention condition and once in the second-mention condition. Across lists, each story appeared twice with each pronoun for counterbalancing, but with the same pair of names to make the stimuli recording feasible. There were a total of 24 pronoun + color + shape combinations ([2](@stim-exp4-2) + [3](@stim-exp4-3)). These clips were recorded as full sentences (not spliced together), and each trial randomly selected 1 of 3 versions, in order to avoid participants learning additional cues about a particular recording. The audio was recorded by the first author, a white native English speaker from the northeast U.S. with a feminine voice.
### Procedure
#### Character Learning
To learn about the characters, participants first saw each character's image, accompanied by their name (e.g., *This is Jaime*) and a fact about them (e.g., *They like to play the piano*, *They work as an engineer*). Each character was shown twice, so that participants saw two examples of the characters' pronouns. However, pronouns were never directly stated (e.g., *This is Jaime, who uses they/them pronouns*), and the use of singular *they* was not explained to participants. Participants were then tested on the names and images of the characters. They were shown all 6 images and asked to click on the named character. If the answer was correct, the image and name of the character was displayed, along with another example of their pronouns (e.g., *Correct, they're Jaime*). If the answer was incorrect, the image of and information about the incorrectly chosen character was shown (e.g., *Incorrect, he's Sam*), followed by the image of and information about the correct character (e.g., *They're Jaime*). To continue, participants were required to get all 6 names correct in the same block. When listening to the stories, participants should then have been able to identify the images of the 2 named characters, and had seen at least 3 examples of each characters' pronouns.
#### Eyetracking
During each trial, the images were displayed for 1 second, then the audio began playing. After the story finished, the images remained on the screen, and the text *Did the story match the picture?* was displayed at the bottom. Participants clicked *YES* or *NO* at the corner of the screen to advance to the next trial. Eye movements were recorded with an Eyelink 1000 desktop-mounted eyetracker recording monocularly at 1000 Hz, with drift correction after every fifth trial. The trial order was randomly generated for each participant, the locations of the 6 images were randomly generated for each trial, and the colors and shapes were counterbalanced.
Participants completed 6 practice trials in order to explain the task and that they should judge whether the story matched the scene based on the colored shape sentence, since the action described at the beginning (e.g., painting a portrait) was not pictured. These practice trials used a name instead of a pronoun (e.g., *Jaime is standing in a blue triangle*). 4 trials matched the scene, and 2 trials mismatched by referring to a color not pictured. After each practice trial, participants saw feedback on if their match judgment was correct.
Participants then completed 96 critical and 18 filler trials, mixed in a randomized order. These varied according to 2 within-subjects factors: Pronoun Pair [They\|HeShe; HeShe\|They; HeShe\|SheHe] and Order of Mention [target mentioned first; second]. The target and competitor characters were evenly distributed, yielding a total of 32 critical trials for each pronoun. Filler trials were included to ensure that participants treated *no* as an option in the match judgment question, even if they considered singular *they* acceptable and knew which characters used they/them. 10 of the filler trials were unambiguously the wrong description, referring to a color that was not pictured on the screen (e.g., for [Figure @fig-exp4-stimuli]B, *they are standing in a red square*). The other 8 filler trials used one of the pronouns of the non-named characters, making the story incorrect for the target character, as well as the competitor character (e.g., for [Figure @fig-exp4-stimuli]B, *she is standing in a blue triangle*). The he/him and she/her characters were each called *they* twice, and the they/them characters were each called *he* once and *she* once. No filler trials used *he* instead of *she* or *she* instead of *he*. Note that throughout the experiment, *they* was always singular, never plural. After completing the 120 trials, participants were tested on the names of the characters, following the same procedure as before, but without feedback.
#### Survey
Finally, participants completed the same singular *they* naturalness ratings, familiarity with using they/them pronouns, gender binary and gender essentialism beliefs ([survey](https://github.com/bethanyhgardner/dissertation/blob/main/materials/exp4/survey.md "Experiment 4 Survey")), and [demographics questions](https://github.com/bethanyhgardner/dissertation/blob/main/materials/exp4/demographics.md "Experiment 4 Demographics") as in Experiment 3. All demographic questions included the option to not respond. @fig-exp4-procedure shows an overview of the full procedure.
![Experiment 4: Procedure.](materials/exp4/figures/procedure.png){#fig-exp4-procedure width="700"}
## Predictions
The first question concerns whether listeners can accurately comprehend *they* as singular, then combine this with knowledge about the character's pronouns to identify who is being described in the story. If so, participants will preferentially look at the target character after the pronoun and before the disambiguating shape word. While it is theoretically possible that singular *they* would show no processing costs compared to *he* or *she*, prior results indicate this is currently unlikely [@arnold2023; @chen2023; @prasad2020; @sanford2007; @shenkar2023]. Instead, listeners may identify the referent for singular *they* before the disambiguation, but more slowly than they do for *he* and *she*. This result would resemble those observed in young children [@arnold2007; @song2005] and in adult second language learners [@cunnings2017; @gruter2012; @speyer2019], who can use gender and order of mention cues from pronouns to identify the referent, but do so more slowly than fluent adults. Alternatively, there are two ways of observing results where listeners do not preferentially look at the target before the disambiguating shape word, which the current experiment cannot distinguish between. One possibility is that listeners attempt to use singular *they* to identify the target character, but do not succeed because of ambiguity. Another possibility is that listeners recognize the potential ambiguity in *they* and strategically choose to wait until hearing more information before deciding on an interpretation.
A secondary question concerns the competitor character, who is named at the beginning of the story but whose pronouns are never used. Stories using *he* and *she* can have a competitor who uses she/her or he/him (never the same as the target character), or a competitor who uses they/them. A difference between these two conditions could be predicted in either direction: If trials where the competitor character uses they/them are slower than trials where the competitor character uses he/him or she/her, this could indicate that some aspect of the they/them characters---the pronoun activated alongside the character or the character themself---is causing greater competition (making it a stronger possibility) than the he/him and she/her characters. If, on the other hand, trials where the competitor character uses they/them are faster, this could indicate that listeners are treating the more ambiguous character as less likely to be referred to, either in general or with a pronoun.
With regards to order of mention, we expect to replicate prior results for *he* and *she*, where participants are more likely to look at target characters who were named first than target characters who were named second, because although the pronoun can refer to either, it is more likely to refer to the person mentioned first [@arnold2000; @arnold2007; @brown-schmidt2017]. If we also observe an order of mention effect for singular *they*---either the same as or present but reduced compared to *he* and *she*---it would suggest that singular *they* is being integrated into listeners' standard discourse processing mechanisms.
## Results
### Participant Backgrounds
```{r}
#| label: exp4-participants-counts
# Age
exp4_n_age <- exp4_d_survey %>%
filter(Category == "Age") %>%
summarise(
min = min(Response_Num),
max = max(Response_Num),
med = median(Response_Num),
mean = mean(Response_Num) %>% round(2),
sd = sd(Response_Num) %>% round(2)
)
exp4_n_age
# Gender
exp4_n_gender <- exp4_d_demographics %>%
filter(Category == "Gender" & Group != "Total") %>%
select(-Category) %>%
rotate_df(cn = TRUE)
exp4_n_gender
# English
exp4_n_english <- exp4_d_demographics %>%
filter(Category == "English Experience" & Group != "Total") %>%
select(-Category) %>%
mutate(Group = case_when(
str_detect(Group, "competent") ~ "Fluent",
str_detect(Group, "Native") ~ "Native"
)) %>%
rotate_df(cn = TRUE)
exp4_n_english
```
```{r}
#| label: exp4-survey-ratings
# Subset data
exp4_d_ratings <- exp4_d_survey %>%
filter(Category == "Sentence Naturalness Ratings" &
!is.na(Response_Num)) %>%
select(ParticipantID, Item, Response_Num) %>%
mutate(Type = ifelse(str_detect(Item, "Name"), "Name", "Indefinite"))
# Means
exp4_r_rating_means <- exp4_d_ratings %>%
group_by(Type) %>%
summarise(mean = mean(Response_Num), SD = sd(Response_Num)) %>%
column_to_rownames("Type") %>%
round(2)
exp4_r_rating_means
```
```{r}
#| label: exp4-survey-ratings-model
# Mean-center according to scale
exp4_d_ratings %<>% mutate(Response_Centered = Response_Num - 4)
# Compare names to indefinites
exp4_d_ratings$Type %<>% as.factor()
contrasts(exp4_d_ratings$Type) <- cbind("=Name_Indefinite" = c(+.5, -.5))
contrasts(exp4_d_ratings$Type)
exp4_m_ratings <- lmer(
formula = Response_Centered ~ Type + (1 | Item) + (Type | ParticipantID),
data = exp4_d_ratings
)
summary(exp4_m_ratings)
exp4_r_ratings <- exp4_m_ratings %>% tidy_model_results()
```
```{r}
#| label: exp4-survey-use-they
exp4_d_use_they <- exp4_d_survey %>%
filter(str_detect(Category, "They/Them") & Response_Bool == TRUE) %>%
group_by(Item) %>%
summarise(n = n()) %>%
bind_rows(tibble( # Add options not represented in exp 4
Item = c("Myself", "Not Heard About"),
n = c(0, 0)
)) %>%
mutate(Item = str_remove_all(Item, " ")) %>%
rotate_df(cn = TRUE)
exp4_d_use_they
```
```{r}
#| label: exp4-survey-gender-beliefs
# Subset & scale data
exp4_d_gender_beliefs <- exp4_d_survey %>%
filter(Category == "Gender Beliefs" & !is.na(Response_Num)) %>%
mutate(Response_Scaled = Response_Num - 1) %>%
group_by(ParticipantID) %>%
summarise(Total = sum(Response_Scaled))
# Summary stats
exp4_d_gender_beliefs <- exp4_d_gender_beliefs %>%
summarise(
min = min(Total),
max = max(Total),
mean = mean(Total) %>% round(2),
SD = sd(Total) %>% round(2)
)
exp4_d_gender_beliefs
```
To contextualize the findings, I first discuss the results of the survey. Most participants were in the typical undergraduate age range (*M* = `r exp4_n_age$mean`, *SD* = `r exp4_n_age$sd`) and described themselves as native English speakers (N = `r exp4_n_english$Native`). `r exp4_n_gender$Female` were women, `r exp4_n_gender$Male` were men, and none identified as transgender and/or a gender different than their sex assigned at birth (@tbl-exp4-demographics). Overall, all participants were at least somewhat familiar with singular *they* before the experiment: `r exp4_d_use_they$HeardAbout` had heard about people using they/them pronouns but not met anyone who does, `r exp4_d_use_they$HaveMet` had met but were not close to anyone who uses they/them, and `r exp4_d_use_they$CloseTo` were close to someone who uses they/them, but `r exp4_d_use_they$Myself` participants used they/them themselves ([Figure @fig-exp4-survey]B). When rating the naturalness of singular *they* coreferring with different types of referents ([Figure @fig-exp4-survey]A), acceptance of indefinite forms was generally high (*M* = `r exp4_r_rating_means['Indefinite', 'mean']`, *SD* = `r exp4_r_rating_means['Indefinite', 'SD']`). Surprisingly, ratings for proper names (*M* = `r exp4_r_rating_means['Name', 'mean']`, *SD* = `r exp4_r_rating_means['Name', 'SD']`) were not significantly lower than ratings for indefinites (`r exp4_r_ratings['Type=Name_Indefinite', 'Text']`) (@tbl-exp4-ratings). For the gender beliefs measure [@nagoshi2008], responses were again scaled to 0--6 and summed, so that a score of 0 indicated the lowest endorsement of the gender binary and gender essentialism, and a score of 54 indicated the highest [Figure @fig-exp4-survey]C. Participant totals spanned the entire range but were strongly skewed towards the lower end, with the mean response favorable towards trans and gender-nonconforming people (range = `r exp4_d_gender_beliefs$min`--`r exp4_d_gender_beliefs$max`, *M* = `r exp4_d_gender_beliefs$mean`, *SD* = `r exp4_d_gender_beliefs$SD`) (see @tbl-exp4-gender-beliefs for item text and means).
| |
|-----|
| |
: Experiment 4: Participant demographics. Categories with higher totals allowed participants to select as many options as applied. All questions included the option to not respond. {#tbl-exp4-demographics .borderless}
```{r ft.align="left"}
#| output: true
demographics_table(
exp4_d_demographics,
categories = c(
"Age", "Gender", "Transgender & Gender-Diverse", "Sexuality",
"English Experience", "Race/Ethnicity"
),
title = "Experiment 4: Participant Demographics"
)
```
```{r}
#| label: fig-exp4-survey
#| fig-cap: "Experiment 4: Prior Familiarity and Attitudes Survey. [A] Naturalness ratings on a 7-point Likert scale (1 = very unnatural, 7 = very natural) for singular *they* coreferring with indefinite referents and with proper names. [B] Experience with using they/them pronouns. [C] Gender binary and essentialism beliefs, with higher scores indicating higher endorsement and thus more negative attitudes towards transgender and gender non-conforming people [@nagoshi2008]. The mean response is indicated by the black line."
#| fig-asp: 0.85
#| output: true
#| cache: true
# Ratings----
exp4_p_ratings <- exp4_d_survey %>%
filter(Category == "Sentence Naturalness Ratings") %>%
mutate(
Response_Num = Response_Num %>%
as.factor() %>%
fct_rev() %>%
recode("7" = "7 Very Natural"),
Item = Item %>%
as.factor() %>%
droplevels() %>%
str_replace("\n", " ") %>%
fct_relevel("Generic", after = 0) %>%
fct_relevel("Every", after = 1) %>%
fct_relevel("Neutral Name", after = 3) %>%
fct_relevel("Fem Name", after = 5)
) %>%
ggplot(aes(y = fct_rev(Item), fill = Response_Num)) +
geom_bar(position = "fill") +
scale_x_continuous(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
scale_fill_brewer(
palette = "Spectral", direction = -1,
guide = guide_legend(
title = "Very Unnatural",
byrow = TRUE, nrow = 1,
direction = "horizontal", reverse = TRUE,
keywidth = .8, keyheight = .8
)
) +
theme_classic() +
survey_theme +
labs(
title = "Singular <i>They</i> Naturalness Ratings",
x = element_blank(), y = element_blank()
)
# Experience using they/them----
exp4_p_familiarity <- exp4_d_survey %>%
filter(str_detect(Category, "They/Them")) %>%
filter(Item != "Aggregate" & Response_Bool == TRUE) %>%
select(ParticipantID, Item, Response_Bool) %>%
pivot_wider(names_from = Item, values_from = Response_Bool) %>%
mutate(.keep = c("unused"), HighestFamiliarity = case_when(
`Close To` == TRUE ~ "Close To",
`Have Met` == TRUE ~ "Have Met",
`Heard About` == TRUE ~ "Heard About"
)) %>%
group_by(HighestFamiliarity) %>%
summarise(n = n_distinct(ParticipantID)) %>%
add_row(n = c(0, 0), HighestFamiliarity = c("Myself", "Not Heard\nAbout")) %>%
mutate(
HighestFamiliarity = HighestFamiliarity %>%
factor(
levels = c(
"Myself", "Close To", "Have Met", "Heard About", "Not Heard\nAbout"
),
ordered = TRUE
),
Label = "Highest\nFamiliarity"
) %>%
ggplot(aes(y = Label, x = n, fill = HighestFamiliarity)) +
geom_bar(position = "fill", stat = "identity") +
scale_fill_brewer(
palette = "Spectral", direction = -1,
guide = guide_legend(
title = NULL, ncol = 6, direction = "horizontal", reverse = TRUE,
keywidth = .8, keyheight = .8
)
) +
scale_x_continuous(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
theme_classic() +
survey_theme +
labs(
title = "Experience Using They/Them Pronouns",
x = element_blank(), y = element_blank(), fill = element_blank()
)
# Gender beliefs----
exp4_d_gender_beliefs <- exp4_d_survey %>%
filter(Category == "Gender Beliefs") %>%
mutate(Response_Scaled = Response_Num - 1) %>%
group_by(ParticipantID) %>%
summarise(Total = sum(Response_Scaled))
exp4_p_gender_beliefs <- exp4_d_gender_beliefs %>%
ggplot(aes(x = Total, fill = as.factor(Total))) +
geom_histogram(binwidth = 1, show.legend = FALSE) +
geom_vline(aes(xintercept = mean(Total))) +
coord_cartesian(xlim = c(54, 0), expand = 0, clip = "off") +
scale_y_continuous(breaks = c(1, 2, 3)) +
scale_fill_manual(values =
rainbow %>%
filter(Score %in% exp4_d_gender_beliefs$Total) %>%
pull(Color)
) +
theme_classic() +
survey_theme +
theme(
axis.title.y = element_text(
angle = 0,
margin = margin(r = -0.9, unit = "in")
)
) +
labs(
title = "Gender Binary & Gender Essentialism Beliefs",
x = "More Endorsement – Less Endorsement",
y = "N\nParticipants"
)
## Combine----
exp4_p_ratings + exp4_p_familiarity + exp4_p_gender_beliefs +
plot_layout(heights = c(2.5, 1, 1.5)) +
plot_annotation(
title = "Experiment 4: Prior Familiarity & Attitudes",
tag_levels = "A",
theme = patchwork_theme
)
```
### Offline Measures
#### Character Learning
```{r}
#| label: exp4-characters-data
exp4_d_characters <- read.csv(
"data/exp4_characters.csv", stringsAsFactors = TRUE
)
```
```{r}
#| label: exp4-characters-pretest
exp4_n_pre <- exp4_d_characters %>%
filter(Section == "pre") %>%
group_by(ParticipantID) %>%
summarise(N_Rounds = n_distinct(Test_Round)) %>%
summarise(
min = min(N_Rounds),
max = max(N_Rounds),
mean = mean(N_Rounds) %>% round(2) %>% format(nsmall = 2),
sd = sd(N_Rounds) %>% round(2) %>% format(nsmall = 2)
)
exp4_n_pre
exp4_r_pretest <- exp4_d_characters %>%
filter(Section == "pre") %>%
group_by(T_Pronoun) %>%
summarise(
mean = mean(Acc) %>% round(2) %>% format(nsmall = 2),
sd = sd(Acc) %>% round(2) %>% format(nsmall = 2)
) %>%
column_to_rownames(var = "T_Pronoun")
exp4_r_pretest
```
```{r}
#| label: exp4-characters-posttest
exp4_n_post <- exp4_d_characters %>%
filter(Section == "post") %>%
group_by(ParticipantID) %>%
summarise(Correct = sum(Acc)) %>%
group_by(Correct) %>%
summarise(n = n_distinct(ParticipantID)) %>%
column_to_rownames("Correct")
exp4_n_post
```
Participants were generally able to learn the name-image pairs within 2--3 test rounds (*M* = `r exp4_n_pre$mean`, *SD* = `r exp4_n_pre$sd`). Across all pretest rounds, accuracies for they/them characters (*M* = `r exp4_r_pretest['They', 'mean']`, *SD* = `r exp4_r_pretest['They', 'sd']`) and she/her characters (*M* = `r exp4_r_pretest['She', 'mean']`, *SD* = `r exp4_r_pretest['She', 'sd']`) were slightly lower than accuracy for he/him characters (*M* = `r exp4_r_pretest['He', 'mean']`, *SD* = `r exp4_r_pretest['He', 'sd']`). Participants remembered the names of the characters throughout the study, with most (N = `r exp4_n_post['6', 'n']`) getting all 6 correct in the post-test, and no participants excluded for getting 4 or fewer correct.
#### Match Judgments
```{r}
#| label: exp4-match-data
# Load data
exp4_d_match <- read.csv("data/exp4_match-judgments.csv",
stringsAsFactors = TRUE) %>%
filter(TrialType != "PR") %>%
filter(!is.na(Match_Num)) %>% # drop missing data (unclear location)
mutate(Story = str_sub(TrialID, -2) %>% as.numeric()) %>%
select(
ParticipantID, TrialType, Pronoun_Pair, T_Pronoun, C_Pronoun,
Story, TrialID, Match_Num, Match_RT, IsOutlier
)
# Contrast coding
contrasts(exp4_d_match$Pronoun_Pair) <- cbind(
"=TheyTarget" = c(+.33, +.33, -.66),
"=TheyComp" = c(+.50, -.50, 0)
)
contrasts(exp4_d_match$Pronoun_Pair)
str(exp4_d_match)
```
```{r}
#| label: exp4-match-acc-means
exp4_r_match_means <- exp4_d_match %>%
group_by(TrialType, Pronoun_Pair) %>%
summarise(
mean = mean(Match_Num) %>% round(2) %>% format(nsmall = 2),
sd = sd(Match_Num) %>% round(2) %>% format(nsmall = 2)
) %>%
ungroup() %>%
mutate(.keep = c("unused"), Condition =
str_c(TrialType, Pronoun_Pair, sep = " ")
) %>%
column_to_rownames("Condition")
exp4_r_match_means
```
```{r}
#| label: exp4-match-acc-model-CR
#| cache: true
exp4_m_match_CR <- buildmer(
formula = Match_Num ~ Pronoun_Pair +
(Pronoun_Pair | ParticipantID) + (Pronoun_Pair | Story),
data = exp4_d_match %>% filter(TrialType == "CR"),
family = binomial,
buildmerControl(direction = c("order"))
)
summary(exp4_m_match_CR)
exp4_r_match_CR <- exp4_m_match_CR@model %>% tidy_model_results()
```
```{r}
#| label: exp4-match-acc-model-FP
#| cache: true
# No wrong pronoun trials for HeShe|They condition
# So mean-center effects code They|HeShe vs HeShe|SheHe
exp4_d_match_FP <- exp4_d_match %>%
filter(TrialType == "FP") %>%
mutate(CorrectPronoun = droplevels(Pronoun_Pair))
contrasts(exp4_d_match_FP$CorrectPronoun) <- cbind(
"=They_HeShe" = c(+.5, -.5)
)
contrasts(exp4_d_match_FP$CorrectPronoun)
exp4_m_match_FP <- buildmer(
formula = Match_Num ~ CorrectPronoun +
(CorrectPronoun | ParticipantID) + (CorrectPronoun | Story),
data = exp4_d_match_FP,
family = binomial,
buildmerControl(direction = c("order"))
)
summary(exp4_m_match_FP)
exp4_r_match_FP <- exp4_m_match_FP@model %>% tidy_model_results()
```
When asked if the description they heard matched the scene (@fig-exp4-match, left), participants correctly judged the majority of test trials to be matching. The match rates for singular *they* trials (*M* = `r exp4_r_match_means['CR They|HeShe', 'mean']`) were not significantly lower than the match rates for *he* and *she* trials (*M~HeShe\|They~* = `r exp4_r_match_means['CR HeShe|They', 'mean']`, *M~HeShe\|SheHe~* = `r exp4_r_match_means['CR HeShe|SheHe', 'mean']`) (@tbl-exp4-match-CR). For the wrong description trials, which referred to a color that was not pictured, participants were correctly at floor for all pronouns. For the wrong pronoun trials, which used the pronoun that neither of the two named characters used, participants varied. However, they were not less likely to indicate a mismatch when they/them characters were referred to with *he* or *she* (*M* = `r exp4_r_match_means['FP They|HeShe', 'mean']`) than when he/him or she/her characters were referred to with *they* (*M* = `r exp4_r_match_means['FP HeShe|SheHe', 'mean']`) (@tbl-exp4-match-FP).
```{r}
#| label: exp4-match-RT-means
# Means for each trial type * pronoun pair condition
exp4_r_RT_means <- exp4_d_match %>%
filter(IsOutlier == FALSE) %>%
group_by(TrialType, Pronoun_Pair) %>%
summarise(mean = round(mean(Match_RT)), sd = round(sd(Match_RT))) %>%
ungroup() %>%
mutate(.keep = c("unused"), Condition =
str_c(TrialType, Pronoun_Pair, sep = " ")
)
# Add means for each trial type, summarizing across pronoun pair
exp4_r_RT_means %<>% bind_rows(
exp4_d_match %>%
filter(IsOutlier == FALSE) %>%
group_by(TrialType) %>%
summarise(mean = round(mean(Match_RT)), sd = round(sd(Match_RT))) %>%
mutate(.keep = c("unused"), Condition = str_c(TrialType, " All"))
) %>%
column_to_rownames("Condition")
exp4_r_RT_means
```
```{r}
#| label: exp4-match-RT-model-build
#| eval: false
exp4_m_match_RT <- buildmer(
formula = Match_RT ~ Pronoun_Pair +
(Pronoun_Pair | ParticipantID) + (Pronoun_Pair | Story),
data = exp4_d_match %>%
filter(TrialType == "CR" & !is.na(Match_Num) & IsOutlier == FALSE),
family = inverse.gaussian(link = "identity"),
buildmerControl(direction = c("order"))
)
```
```{r}
#| label: exp4-match-RT-model-results
exp4_m_match_RT <- readRDS("r_data/exp4_match_RT.RDS")
summary(exp4_m_match_RT)
exp4_r_match_RT <- exp4_m_match_RT@model %>% tidy_model_results()
```
Reaction times were calculated from the display of the match question until the participant's click, with responses outside of 3SD of the mean of each trial type excluded as outliers (@fig-exp4-match, right). Similar to the accuracy data, reaction times were shortest for wrong description trials (*M* = `r exp4_r_RT_means['FD All', 'mean']`ms, *SD* = `r exp4_r_RT_means['FD All', 'sd']`ms), somewhat longer for test trials (*M* = `r exp4_r_RT_means['CR All', 'mean']`ms, *SD* = `r exp4_r_RT_means['CR All', 'sd']`ms), and longest and most variable for wrong pronoun trials (*M* = `r exp4_r_RT_means['FP All', 'mean']`ms, *SD* = `r exp4_r_RT_means['FP All', 'sd']`ms). An inverse Gaussian distribution with an identity link was fit to test whether Pronoun Pair affected reaction time in the test trials (@tbl-exp4-match-RT). This accounts for the non-normal distribution of reaction time data, but in contrast to applying a non-linear transformation (e.g., log), it maintains the theoretical assumption that experimental manipulations affect the total amount of time to make a decision [@lo2015]. The maximal model that converged included by-participant and by-item intercepts and slopes for Pronoun Pair [@bates2015; @voeten2023; @rcoreteam2023]. Participants were slower to make match judgments for stories using singular *they* than stories using *he* and *she* (`r exp4_r_match_RT['Pronoun_Pair=TheyTarget', 'Text']`). The pronoun of the competitor character did not affect reaction times (`r exp4_r_match_RT['Pronoun_Pair=TheyComp', 'Text']`).
```{r}
#| label: fig-exp4-match
#| fig-cap: "Experiment 4: By-participant mean proportions of stories judged to match the picture (left) and reaction times (right). Lines indicate by-participant means between Pronoun Pair conditions; violins indicate the distribution of by-participant means; point ranges indicate condition means and 95% CIs calculated over the by-participant means. The correct answers are match (=1) for test trials and mismatch (=0) for wrong pronoun and wrong description trials. For the wrong description trials, he/him or she/her for a they/them character corresponds to the HeShe|SheHe condition; they/them for a he/him or she/her character corresponds to the They|HeShe condition; and there were no wrong pronoun trials for the HeShe|They condition."
#| fig-width: 6.5
#| fig-height: 7.5
#| output: true
#| cache: true
# Setup----
exp4_d_match_plots <- read.csv("data/exp4_match-judgments.csv",
stringsAsFactors = TRUE) %>%
filter(TrialType != "PR") %>%
mutate(TrialType = factor(
TrialType,
levels = c("CR", "FD", "FP"),
labels = c("Test", "Wrong Description", "Wrong Pronoun")
)) %>%
filter(!is.na(Match_Num) & IsOutlier == FALSE) %>%
group_by(ParticipantID, TrialType, Pronoun_Pair) %>%
summarise(
Mean_Match = mean(Match_Num, na.rm = TRUE),
Mean_RT = mean(Match_RT, na.rm = TRUE)
)
# Test trials----
exp4_p_match_test <- (
ggplot(
data = exp4_d_match_plots %>% filter(TrialType == "Test"),
aes(x = Pronoun_Pair, color = TrialType, y = Mean_Match)) +
geom_line(
aes(group = ParticipantID), color = "#3288BD",
position = position_jitter(width = 0, height = 0.02, seed = 4)
) +
stat_summary(
fun.data = mean_se, geom = "pointrange",
color = "black", linewidth = 0.75, size = 0.25
) +
scale_x_discrete(expand = c(0.15, 0.15), position = "top") +
scale_y_continuous(
expand = c(0, 0), limits = c(-0.03, 1.20),
breaks = c(0, 0.25, 0.5, 0.75, 1)) +
theme_classic() +
match_theme +
guides(color = guide_none()) +
labs(
title = "Test Trials",
x = element_blank(), y = element_blank()
) +
annotate(
geom = "rect", fill = NA, color = "black",
xmin = c(0.5, 1.5, 2.5), xmax = c(1.5, 2.5, 3.5), ymin = 1.05, ymax = 1.20
)
) + (
ggplot(
data = exp4_d_match_plots %>% filter(TrialType == "Test"),
aes(x = Pronoun_Pair, y = Mean_RT)) +
geom_violin(color = "#3288BD", fill = "#3288BD") +
stat_summary(
fun.data = mean_se, geom = "pointrange",
color = "black", linewidth = 0.75, size = 0.25
) +
scale_x_discrete(expand = c(0.15, 0.15), position = "top") +
scale_y_continuous(
expand = c(0, 0),
breaks = c(2000, 4000, 6000, 8000, 10000)
) +
theme_classic() +
match_theme +
guides(color = guide_none()) +
labs(title = element_blank(), x = element_blank(), y = element_blank()) +
annotate(
geom = "rect", fill = NA, color = "black",
xmin = c(0.5, 1.5, 2.5), xmax = c(1.5, 2.5, 3.5),
ymin = 10000, ymax = 11000
)
)
# Wrong description trials----
exp4_p_match_fd <- (
ggplot(
data = exp4_d_match_plots %>% filter(TrialType == "Wrong Description"),
aes(x = Pronoun_Pair, color = TrialType, y = Mean_Match)) +
geom_line(
aes(group = ParticipantID), color = "#99D594",
position = position_jitter(width = 0, height = 0.02, seed = 4)
) +
stat_summary(
fun.data = mean_se, geom = "pointrange",
color = "black", linewidth = 0.75, size = 0.25
) +
scale_x_discrete(expand = c(0.15, 0.15), position = "top") +
scale_y_continuous(
expand = c(0, 0), limits = c(-0.03, 1.20),
breaks = c(0, 0.25, 0.5, 0.75, 1)) +
theme_classic() +
match_theme +
guides(color = guide_none()) +
labs(
title = "Wrong Description Trials",
x = element_blank(),
y = "By-Participant Mean Matching"
) +
annotate(
geom = "rect", fill = NA, color = "black",
xmin = c(0.5, 1.5, 2.5), xmax = c(1.5, 2.5, 3.5), ymin = 1.05, ymax = 1.20
)
) + (
ggplot(
data = exp4_d_match_plots %>% filter(TrialType == "Wrong Description"),
aes(x = Pronoun_Pair, color = TrialType, y = Mean_RT)) +
geom_violin(color = "#99D594", fill = "#99D594") +
stat_summary(
fun.data = mean_se, geom = "pointrange",
color = "black", linewidth = 0.75, size = 0.25
) +
scale_x_discrete(expand = c(0.15, 0.15), position = "top") +
scale_y_continuous(
expand = c(0, 0),
breaks = c(2000, 5500, 9000, 12500, 16000)
) +
theme_classic() +
match_theme +
guides(color = guide_none()) +
labs(
title = element_blank(),
x = element_blank(),
y = "By-Participant Mean RT (ms)"
) +
annotate(
geom = "rect", fill = NA, color = "black",
xmin = c(0.5, 1.5, 2.5), xmax = c(1.5, 2.5, 3.5),
ymin = 16500, ymax = 18500
)
)
# Wrong pronoun trials----
exp4_p_match_fp <- (
ggplot(
data = exp4_d_match_plots %>% filter(TrialType == "Wrong Pronoun"),
aes(x = Pronoun_Pair, color = TrialType, y = Mean_Match)) +
geom_line(
aes(group = ParticipantID), color = "#D53E4F",
position = position_jitter(width = 0, height = 0.02, seed = 4)
) +
stat_summary(
fun.data = mean_se, geom = "pointrange",
color = "black", linewidth = 0.75, size = 0.25
) +
scale_x_discrete( # clarify labels for this condition
expand = c(0.15, 0.15), position = "top",
labels = c(
"HeShe|SheHe" = "They For He/She",
"They|HeShe" = "He/She For They"
)
) +
scale_y_continuous(
expand = c(0, 0), limits = c(-0.03, 1.20),
breaks = c(0, 0.25, 0.5, 0.75, 1)) +
theme_classic() +
match_theme +
guides(color = guide_none()) +
labs(
title = "Wrong Pronoun Trials",
x = element_blank(), y = element_blank()
) +
annotate(
geom = "rect", fill = NA, color = "black",
xmin = c(0.5, 1.5), xmax = c(1.5, 2.5), ymin = 1.05, ymax = 1.20
)
) + (
ggplot(
data = exp4_d_match_plots %>% filter(TrialType == "Wrong Pronoun"),
aes(x = Pronoun_Pair, color = TrialType, y = Mean_RT)) +
geom_violin(color = "#D53E4F", fill = "#D53E4F") +
stat_summary(
fun.data = mean_se, geom = "pointrange",
color = "black", linewidth = 0.75, size = 0.25
) +
scale_x_discrete( # clarify labels for this condition
expand = c(0.15, 0.15), position = "top",
labels = c(
"HeShe|SheHe" = "They For He/She",
"They|HeShe" = "He/She For They"
)
) +
scale_y_continuous(
expand = c(0, 0),
breaks = c(2000, 5000, 8000, 11000, 14000)
) +
theme_classic() +
match_theme +
guides(color = guide_none()) +
labs(title = element_blank(), x = element_blank(), y = element_blank()) +
annotate(
geom = "rect", fill = NA, color = "black",
xmin = c(0.5, 1.5), xmax = c(1.5, 2.5), ymin = 14000, ymax = 16000
)
)
# Combine----
exp4_p_match_test / exp4_p_match_fd / exp4_p_match_fp +
plot_annotation(
title = "Experiment 4: Match Judgments",
theme = patchwork_theme
)
```
### Online Processing
```{r}
#| label: exp4-eye-data
# 30 subj * 96 trials * 103 timesteps = 296640
# -6 trials with no data at all (618) = 296022
exp4_d <- exp4_load_data_stats()
str(exp4_d)
# Pronoun, Order
contrasts(exp4_d$Pronoun_Pair) # Double check contrast coding
contrasts(exp4_d$Order)
# Trend (rescaled, centered)
summary(exp4_d$Time)
summary(exp4_d$Time_Scaled)
# AR(1)
exp4_d %>% select(WasTarget, IsTarget) %>% summary()
# Number of data points per trial
exp4_d %>%
group_by(ParticipantID, TrialID) %>%
summarise(n = n()) %>% # Count observations per trial per participant
group_by(n) %>%
summarise(n_obs = n_distinct(n)) # All have 103 obs
```
@fig-exp4-6panel shows fixations to the target, competitor, distractor, and no characters, starting 500ms before the onset of the pronoun and continuing for 2500ms (e.g., *...spilling on the floor. They are standing in a blue triangle, and the painting looks amazing*). The He\|She and She\|He trials (first row) generally resemble prior results [@arnold2000; @arnold2007; @brown-schmidt2017], with participants starting to look at the target more than the competitor rapidly after the onset of the pronoun, and more so when the target character was mentioned first. Unexpectedly, the order of mention effect---where listeners look more at the character named first in the story than to the character named second---is only clear in He\|She trials, not in She\|He trials. The He\|They and She\|They trials (second row) show the expected order effect, but participants are less likely to be looking at the target than in the He\|She and She\|He trials. The They\|He and They\|She trials (third row) still show participants looking at the target more than the competitor before the onset of the shape word, but less than in the other two conditions, and no order effect is apparent. Examining fixations during the beginning of the story (@fig-exp4-names), participants looked at the target and competitor after each was named, and the time course did not differ by the character's pronouns. This confirms that participants knew the names of the characters and had identified the two possible referents before the start of the critical time window.
```{r}
#| label: fig-exp4-6panel
#| fig-cap: "Experiment 4: Eyetracking: Full Window. Proportions of looks to the target, competitor, distractor (average of 4), and no characters, split by target pronoun, competitor pronoun, and order of mention conditions. The gray box indicates the analysis region, starting 200ms after pronoun onset and ending at 1210ms, the earliest shape word onset across stimuli."
#| fig-asp: 1.25
#| output: true
#| cache: true
exp4_load_data_plots_full() %>%
ggplot(aes(x = Timestep_Start, y = Prop, color = Color, linetype = Order)) +
geom_rect(
xmin = 200, xmax = 1210, ymin = 0, ymax = 1,
fill = "grey95", color = "grey95"
) +
geom_line(key_glyph = "timeseries", linewidth = 0.75) +
facet_wrap(~Pronoun_Pair, ncol = 2) +
scale_color_manual(values = c(
"#1B9E77", "#D95F02", "#7570B3", "grey30", "grey60")
) +
scale_y_continuous(limits = c(0, 1), expand = c(0, 0)) +
theme_classic() +
eyetracking_theme +
theme(plot.title.position = "plot") +
guides(
color = guide_legend(order = 1, override.aes = theme(linewidth = 1)),
linetype = guide_legend(order = 2, override.aes = theme(linewidth = 1))
) +
labs(
x = "Time Relative to Pronoun Onset (ms)",
y = "Proportion of Looks",
color = "Item",
linetype = "Target\nMentioned",
title =
"Experiment 4: Looks During Full Window By Target & Competitor Pronouns"
)
```
The primary analysis window, shown in grey in @fig-exp4-6panel, was offset by 200ms after the onset of the pronoun, the estimated time it takes to plan and execute a saccade in response to the auditory stimulus [@hallett1986]. It continued until `r max(exp4_d$Time)`ms after the pronoun onset, which was the earliest shape word onset across all of the stimuli (range = `r exp4_audio_times$min`--`r exp4_audio_times$max`ms, *M* = `r exp4_audio_times$mean`ms, *SD* = `r exp4_audio_times$sd`ms). Results were analyzed with dynamic generalized mixed-effects models, predicting whether participants looked at the target character (=1) or not (=0) at each time point [@cho2018; @brown-schmidt2020]. Observations were down-sampled to 10ms bins, where bins that included \>5ms of a fixation on or a saccade to the target [@mcmurray2009; @mcmurray2019] were coded as 1, bins that included \<5ms were coded as 0, and bins that included 5ms were coded as 1 if they followed a bin that was coded as 1 and 0 if not. Aside from this, the data was not aggregated across trials or participants. The model included a fixed effect for Trend (timestep during trial, mean-centered) to capture linear changes across the trial in the level of fixations to the target. To account for autocorrelation between time points, the model included an AR(1) term, which captures whether the participant was looking at the target in the prior timestep. To calculate AR(1) for the start of the analysis window, timesteps for 180ms and 190ms were included in the data, and then the first timestep with missing data for AR(1) was excluded prior to estimation, resulting in `r n_distinct(exp4_d$Timestep)` data points for each trial.
The fixed effect of Pronoun Pair was coded with orthogonal Helmert contrasts, with the first contrast comparing trials with *they* target characters to trials with *he* or *she* target characters (They\|HeShe vs HeShe\|They + HeShe\|SheHe), and the second contrast comparing trials with *he* or *she* target and *they* competitor characters to trials with *he* or *she* target and *she* or *he* competitor characters (HeShe\|They vs HeShe\|SheHe). The fixed effect of Order was mean-center effects coded, comparing trials where the target character was mentioned second to trials where the target character was mentioned first. @fig-exp4-3panel shows the proportion of looks to the target and competitor characters during the analysis window, comparing the 3 Pronoun Pair and 2 Order of Mention conditions.
```{r}
#| label: fig-exp4-3panel
#| fig-cap: "Experiment 4: Eyetracking: Analysis Window. Proportion of looks to target, competitor, distractor (average of 4), and no characters, comparing between the 3 pronoun pair and 2 order of mention conditions. The window starts 200ms after pronoun onset and ends at 1210ms, the earliest shape word onset across stimuli."
#| fig-asp: 0.6
#| output: true
#| cache: true
exp4_load_data_plots_crit() %>%
ggplot(aes(x = Timestep_Start, y = Prop, color = Item, linetype = Order)) +
geom_line(key_glyph = "timeseries", linewidth = 0.75) +
facet_wrap(~Pronoun_Pair) +
scale_color_manual(values = c("#086FC4", "forestgreen", "grey50", "grey80")) +
scale_x_continuous(
expand = c(0, 0),
breaks = c(200, 400, 600, 800, 1000, 1200)
) +
scale_y_continuous(limits = c(0, 0.75), breaks = c(0, 0.25, 0.50, 0.75)) +
geom_vline(xintercept = 180, linewidth = 1) +
theme_classic() +
eyetracking_theme +
theme(
axis.line.y = element_blank(),
legend.margin = margin(l = -0.05, unit = "in"),
panel.spacing.x = unit(0.25, "in"),
plot.margin = margin(l = 0.05, r = 0.10, t = 0.10, b = 0.05, unit = "in")
) +
guides(
color = guide_legend(order = 1, override.aes = theme(linewidth = 1)),
linetype = guide_legend(order = 2, override.aes = theme(linewidth = 1))
) +
labs(
title = paste(
"Experiment 4: Proportion of Looks to Characters",
"During Critical Window"
),
x = "Time Relative to Pronoun Onset (ms)",
y = element_blank(), # Proportion Looking at Item
color = "Item",
linetype = "Target\nMentioned"
)
```
The maximal random effects structure [@baayen2008; @barr2013] included by-participant and by-item slopes for Pronoun Pair, Order, AR(1), Trend (time point during trial), and Trial Number (time point during experiment), with items defined as the 60 story frames that named the 2 characters. The *lme4* and *buildmer* packages in R identified the most complex random effects structure that would converge [@bates2015; @rcoreteam2023; @voeten2023], which included by-participant slopes for AR, Order, and Trial Number and by-item slopes for Order and Trend (@tbl-exp4-pronoun-pair).
```{r}
#| label: exp4-model-main-build
#| eval: false
cluster7 <- makeCluster(7, type = "SOCK")
clusterEvalQ(cluster7, library("buildmer"))
clusterExport(cluster7, "exp4_d")
exp4_m_pronoun_pair <- buildmer(
formula = IsTarget ~ Time_Centered + WasTarget + Pronoun_Pair * Order +
(Time_Centered * WasTarget * Trial_Scaled * Pronoun_Pair * Order |
ParticipantID) +
(Time_Centered * WasTarget * Trial_Scaled * Pronoun_Pair * Order |
Story),
data = exp4_d,
family = binomial,
buildmerControl(direction = "order", cl = cluster7))
stopCluster(cluster7)
```
```{r}
#| label: exp4-model-main-results
exp4_m_pronoun_pair <- readRDS("r_data/exp4_pronoun-pair.RDS")
exp4_m_pronoun_pair %>% summary()
exp4_r_pronoun_pair <- exp4_m_pronoun_pair %>% tidy_model_results()
```
The AR(1) effect was significant (`r exp4_r_pronoun_pair['WasTarget', 'Text']`), reflecting the fact that participants were more likely to be looking at the target during the current timestep if they had been looking at the target during the previous timestep. Trend was not significant (`r exp4_r_pronoun_pair['Time', 'Text']`), indicating that the overall level of target fixations did not significantly increase or decrease in a linear fashion over the course of the trial.
Both contrasts for Pronoun Pair were significant: Participants were more likely to look at the target character after the onset of *he* and *she* than after the onset of *they*, across Order conditions (`r exp4_r_pronoun_pair['Pronoun_Pair=TheyTarget', 'Text']`). After the onset of *he* or *she*, participants were more likely to look at the target character if the competitor character used he/him or she/her than if the competitor character used they/them (`r exp4_r_pronoun_pair['Pronoun_Pair=TheyComp', 'Text']`). Visual inspection of the data shows that in stories using *he* and *she*, looks to the target diverge from looks to the competitor and reach a proportion of 0.5 in the first quarter of the analysis window. In stories using *they*, looks to the target diverge from looks to the competitor in the first quarter of the analysis window, but do not reach 0.5 until the last quarter (@fig-exp4-3panel).
```{r}
#| label: exp4-model-order-build
#| eval: false
cluster6 <- makeCluster(6, type = "SOCK")
clusterEvalQ(cluster6, library("buildmer"))
clusterExport(cluster6, "exp4_d")
# HeShe|SheHe
exp4_m_HS.SH <- buildmer(
formula = IsTarget ~ WasTarget + Time_Centered + Pronoun_Pair_HS.SH * Order +
(WasTarget + Order + Trial_Scaled | ParticipantID) +
(Order + Time_Centered | Story),
data = exp4_d,
family = binomial,
buildmerControl(direction = "order", cl = cluster6)
)
# HeShe|They
exp4_m_HS.T <- buildmer(
formula = IsTarget ~ WasTarget + Time_Centered + Pronoun_Pair_HS.T * Order +
(WasTarget + Order + Trial_Scaled | ParticipantID) +
(Order + Time_Centered | Story),
data = exp4_d,
family = binomial,
buildmerControl(direction = "order", cl = cluster6)
)