Implementation of "Counterfactual Fairness in Text Classification through Robustness"
This paper studies counterfactual fairness in text classification, which asks the question: “How would the prediction change if the sensitive attribute referenced in the example were different?”
Python - 3.6
Pytorch==1.4.0
Pandas==1.0.3
Numpy==1.18.2
Model | Eval NT | Sythetic NT | Synthetic Toxic |
---|---|---|---|
Baseline | 0.116 (0.140) | 0.105 (0.180) | 0.065(0.061) |
Blind | 0.00 (0.00) | 0.00 (0.00) | 0.00 (0.00) |
CF Aug | 0.116 (0.127) | 0.100 (0.226) | 0.059 (0.022) |
Clp_nontoxic,lambda=1 | 0.023 (0.012) | 0.0065 (0.015) | 0.0144 (0.007) |
Clp, lambda=0.05 | 0.043 (0.071) | 0.010 (0.082) | 0.012 (0.024) |
CLP, lambda=1 | 0.027 (0.007) | 0.012 (0.015) | 0.0114 (0.007) |
Clp, lambda=5 | 0.012 (0.002) | 0.003 (0.004) | 0.0036 (0.004) |
Model | CTF Gap: Held out terms |
---|---|
Baseline | 0.193 (0.091) |
Blind | 0.178 (0.09) |
CF Aug | 0.207 (0.087) |
CLP_nontox,lambda=1 | 0.121 (0.095) |
Clp, lambda=0.05 | 0.091 (0.078) |
Clp, lambda=1 | 0.040 (0.084) |
Clp, lambda=5 | 0.044 (0.076) |
Model | TNR Gap | TPR Gap |
---|---|---|
Baseline | 0.150 (0.084) | 0.272 (0.082) |
Blindness | 0.163 (0.039) | 0.293 (0.114) |
Augmentation | 0.151 (0.065) | 0.253 (0.083) |
CLP Nontoxic, l=1 | 0.156 | 0.261 |
CLP, l=0.05 | 0.157 (0.058) | 0.246 (0.078) |
CLP, l=1 | 0.175 (0.039) | 0.224 (0.104) |
CLP, l=5 | 0.163 (0.041) | 0.272 (0.112) |
The values in the parenthesis are those that are reported by the authors, and values outside the parenthesis are those that we have computed.
To run the baseline, blindness and augmentation models, set the dataset file paths accordingly in load_data function. Set the variables use_clp and use_clp_nontoxic both to False.
To run the Counterfactual Logit Pairing model, set the dataset file paths used for baseline. Set the variable use_clp to True and use_clp_nontoxic to False and set the hyperparameter lambda accordingly.
To run the Counterfactual Logit Pairing Nontoxic model, set the dataset file paths used for baseline. Set the variable use_clp to False and use_clp_nontoxic to True and set the hyperparameter lambda accordingly.
- We used a custom CNN text classification model since the exact CNN architecture used by the authors is not mentioned in the paper.
- The evaluation dataset used in the paper is private, and hence we have used the test dataset from Kaggle challenge.
- The exact split of the identity tokens is not specified, hence we performed a random split, keeping the three bigram identity tokens as held out terms as mentioned in the paper.
Try more complex CNN architectures for comparison.
Students from IIT Kharagpur:
- Sai Saketh Aluru - 16CS30030
- Potnuru Anusha - 16CS30027
- PVSL Hari Chandana - 16CS30026
- K Sai Surya Teja - 16CS30015
- Kaustubh Maloo - 15MA20019