This code accompanies the paper "Privacy Risks of Securing Machine Learning Models against Adversarial Examples", accepted by ACM CCS 2019 https://arxiv.org/abs/1905.10291.
We perform membership inference attacks against machine learning models which are trained to be robust against adversarial examples.
In total, we evaluate the privacy leakage introduced by six state-of-the-art robust training algorithms: PGD-based adversarial training, distributional adversarial training, difference-based adversarial training, duality-based verification, abstract interpretation-based verification, interval bound propagation-based verification.
We find out that robust training algorithms tend to increase the membership information leakage of trained models, compared to the natural training algorithm.
inference_utils.py
: defined function of membership inference based on prediction confidence
utils.py
: defined function to prepare Yale Face dataset
membership_inference_results.ipynb
: lists membership inference results
- Inside the folder of each robust training method
output_utils.py
: defined function to obtain predictions of training and test data, in both benign and adversarial settings
README.md
: instructions on how to train a robust (or natural) classifier- Inside the subfolder of each dataset
output_performance.ipynb
: obtains model predictions
- Inside the subfolder of each dataset
Tensorflow-1.12; Pytorch-0.4