Re-thinking the ETHICS utilitarianism task

This repository corresponds to the report, Re-thinking the ETHICS utilitarianism task, available here.

Abstract

We perform an exploratory study of the ETHICS utilitarianism task dataset (Hendrycks et al. 2021), and investigate approaches to improve interpretability of transformer models fine-tuned on this task. We identify substantial train-test overlap, marked train-test distributional shift, and significant label non-reproducibility yielding ceilings of performance. This motivates a re-release of a reformulated dataset. We then consider attention mapping, Shapley additive explanations (SHAP), and Bayesian methods for model certainty estimation, as approaches to improve interpretability. Through SHAP we identify several model failure modes, including sensitivity to sentence length and ungrammatical word repetition. We find weight perturbation techniques have limited utility when applied to large transformer models despite being computationally cheap, and identify Monte Carlo dropout as a promising candidate for certainty estimation. We implement a direct scenario comparison model that improves performance on a hard subset of the data.

We also make available:

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
1_original_study_datasets		1_original_study_datasets
2_manual_labelling_preparation		2_manual_labelling_preparation
3_exploration		3_exploration
4_reformulated_datasets		4_reformulated_datasets
5_attribution_methods		5_attribution_methods
6_new_models		6_new_models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
check_directory.txt		check_directory.txt
demo_notebook.ipynb		demo_notebook.ipynb
report.pdf		report.pdf
spotlight_slides.pdf		spotlight_slides.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Re-thinking the ETHICS utilitarianism task

Abstract

About

Releases

Packages

Languages

License

danielmamay/nlp-ethics

Folders and files

Latest commit

History

Repository files navigation

Re-thinking the ETHICS utilitarianism task

Abstract

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages