CausalGym

Aryaman Arora, Dan Jurafsky, and Christopher Potts. 2024. CausalGym: Benchmarking causal interpretability methods on linguistic tasks. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14638–14663, Bangkok, Thailand. Association for Computational Linguistics.

HuggingFace dataset: aryaman/causalgym

CausalGym is a benchmark for comparing the performance of causal interpretability methods on a variety of simple linguistic tasks taken from the SyntaxGym evaluation set (Gauthier et al., 2020, Hu et al., 2020) and converted into a format suitable for interventional interpretability.

This repository includes code for:

Training DAS and all the other methods benchmarked in the paper, on every region, layer, and task for some model. This is sufficient for replicating all experiments in the paper (including hyperparameter sweeps and interpretability during training).
Reproducing every plot in the paper.
Template specifications for every task in the benchmark and utils for generating examples, tokenizing, generating non-overlapping train/test sets, and so on.
Testing model outputs on the task templates; this was used to design the benchmark tasks.

You can also download the train/dev/test splits for each task as used in the paper via HuggingFace.

If you are having trouble getting anything running, do not hesitate to file an issue! We would love to help you benchmark your new method or help you replicate the results from our paper.

Instructions

Important

The implementations in this repo are only for GPTNeoX-type language models (e.g. the pythia series) and will probably not work for other architectures without some modifications.

First install the requirements (a fresh environment is probably best):

pip install -r requirements.txt

Training

To train every method, layer, region, and task for pythia-70m (results are logged to the directory logs/das/):

python test_all.py --model EleutherAI/pythia-70m

To do the same but with the dog-give control task used to compute selectivity:

python test_all.py --model EleutherAI/pythia-70m --manipulate dog-give

To run just the Preposing in PP extension:

python test_all.py --model EleutherAI/pythia-70m --datasets preposing_in_pp/preposing_in_pp preposing_in_pp/preposing_in_pp_embed_1

Analysis + plots

Once you have run this for several models, you can create results tables (like those found in the appendix) with:

python plot.py --file logs/das/ --plot summary --metric odds --reload

This also caches intermediate results in csv file in the directory, so you don't need to use the --reload option again unless you need to recompute statistics.

To produce the causal tracing-style plots for all methods:

python plot.py --file logs/das/ --plot pos_all --metric odds

To visualize just runs from the Preposing in PP extension:

python plot.py --file logs/das/ --plot pos_all --metric odds --template_filename preposing_in_pp

You can also specify a subset of methods:

python plot.py --file logs/das/ --plot pos_t --metric odds --methods das vanilla probe

Citation

Please cite the CausalGym publication:

@inproceedings{arora-etal-2024-causalgym,
    title = "{C}ausal{G}ym: Benchmarking causal interpretability methods on linguistic tasks",
    author = "Arora, Aryaman and Jurafsky, Dan and Potts, Christopher",
    editor = "Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.785",
    doi = "10.18653/v1/2024.acl-long.785",
    pages = "14638--14663"
}

Also cite the earlier SyntaxGym papers:

@inproceedings{gauthier-etal-2020-syntaxgym,
    title = "{S}yntax{G}ym: An Online Platform for Targeted Evaluation of Language Models",
    author = "Gauthier, Jon and Hu, Jennifer and Wilcox, Ethan and Qian, Peng and Levy, Roger",
    editor = "Celikyilmaz, Asli and Wen, Tsung-Hsien",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.acl-demos.10",
    doi = "10.18653/v1/2020.acl-demos.10",
    pages = "70--76",
}

@inproceedings{hu-etal-2020-systematic,
    title = "A Systematic Assessment of Syntactic Generalization in Neural Language Models",
    author = "Hu, Jennifer and Gauthier, Jon and Qian, Peng and Wilcox, Ethan and Levy, Roger",
    editor = "Jurafsky, Dan and Chai, Joyce and Schluter, Natalie and Tetreault, Joel",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.acl-main.158",
    doi = "10.18653/v1/2020.acl-main.158",
    pages = "1725--1744",
}

Task examples

Task	Example
Agreement (4)
`agr_gender`	[John][Jane] walked because [he][she]
`agr_sv_num_subj-relc`	The [guard][guards] that hated the manager [is][are]
`agr_sv_num_obj-relc`	The [guard][guards] that the customers hated [is][are]
`agr_sv_num_pp`	The [guard][guards] behind the managers [is][are]
Licensing (7)
`agr_refl_num_subj-relc`	The [farmer][farmers] that loved the actors embarrassed [himself][themselves]
`agr_refl_num_obj-relc`	The [farmer][farmers] that the actors loved embarrassed [himself][themselves]
`agr_refl_num_pp`	The [farmer][farmers] behind the actors embarrassed [himself][themselves]
`npi_any_subj-relc`	[No][The] consultant that has helped the taxi driver has shown [any][some]
`npi_any_obj-relc`	[No][The] consultant that the taxi driver has helped has shown [any][some]
`npi_ever_subj-relc`	[No][The] consultant that has helped the taxi driver has [ever][never]
`npi_ever_obj-relc`	[No][The] consultant that the taxi driver has helped has [ever][never]
Garden path effects (6)
`garden_mvrr`	The infant [who was][⌀] brought the sandwich from the kitchen [by][.]
`garden_mvrr_mod`	The infant [who was][⌀] brought the sandwich from the kitchen with a new microwave [by][.]
`garden_npz_obj`	While the students dressed [,][⌀] the comedian [was][for]
`garden_npz_obj_mod`	While the students dressed [,][⌀] the comedian who told bad jokes [was][for]
`garden_npz_v-trans`	As the criminal [slept][shot] the woman [was][for]
`garden_npz_v-trans_mod`	As the criminal [slept][shot] the woman who told bad jokes [was][for]
Gross syntactic state (4)
`gss_subord`	[While the][The] lawyers lost the plans [they][.]
`gss_subord_subj-relc`	[While the][The] lawyers who wore white lab jackets studied the book that described several advances in cancer therapy [,][.]
`gss_subord_obj-relc`	[While the][The] lawyers who the spy had contacted repeatedly studied the book that colleagues had written on cancer therapy [,][.]
`gss_subord_pp`	[While the][The] lawyers in a long white lab jacket studied the book about several recent advances in cancer therapy [,][.]
Long-distance dependencies (8)
`cleft`	What the young man [did][ate] was [make][for]
`cleft_mod`	What the young man [did][ate] after the ingredients had been bought from the store was [make][for]
`filler_gap_embed_3`	I know [that][what] the mother said the friend remarked the park attendant reported your friend sent [him][.]
`filler_gap_embed_4`	I know [that][what] the mother said the friend remarked the park attendant reported the cop thinks your friend sent [him][.]
`filler_gap_hierarchy`	The fact that the brother said [that][who] the friend trusted [the][was]
`filler_gap_obj`	I know [that][what] the uncle grabbed [him][.]
`filler_gap_pp`	I know [that][what] the uncle grabbed food in front of [him][.]
`filler_gap_subj`	I know [that][who] the uncle grabbed food in front of [him][.]

Name		Name	Last commit message	Last commit date
Latest commit History 309 Commits
data		data
.gitignore		.gitignore
README.md		README.md
benchmark.py		benchmark.py
das.py		das.py
data.py		data.py
diff_methods.py		diff_methods.py
eval.py		eval.py
interventions.py		interventions.py
plot.py		plot.py
prompt.py		prompt.py
requirements.txt		requirements.txt
test_all.py		test_all.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CausalGym

Instructions

Training

Analysis + plots

Citation

Task examples

About

Releases 1

Packages

Contributors 2

Languages

aryamanarora/causalgym

Folders and files

Latest commit

History

Repository files navigation

CausalGym

Instructions

Training

Analysis + plots

Citation

Task examples

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages