Code for acl 2020 paper "The Sensitivity of Language Models and Humans to Winograd Schema Perturbations". Preprint here: https://arxiv.org/pdf/2005.01348.pdf
All perturbations are stored as a TSV in data/final.tsv
and in a separarte json file. Each perturbation is referred to by a field beginning with text
; pron_index
refer to the index of the ambiguous pronoun. answer
fields refer to the appropriate answer. New indices and answers are provided for perturbations that affect either of these; they are suffixed with the name of the perturbation.
To evaluate a pretrained model, simply run the relevant script. Different scripts for different language models take variations in tokenisation into account.