This is the repository for the paper On the Trade-off between Adversarial and Backdoor Robustness, by Cheng-Hsin Weng , Yan-Ting Lee, and Shan-Hung Wu, published in the Proceedings of NeurIPS 2020. Our code is implemented in TensorFlow.
In this paper, we conduct experiments to study whether adversarial robustness and backdoor robustness can affect each other and find a trade-off — by increasing the robustness of a network to adversarial examples, the network becomes more vulnerable to backdoor attacks.
Clone and install requirements.
git clone https://github.com/nthu-datalab/On.the.Trade-off.between.Adversarial.and.Backdoor.Robustness
cd On.the.Trade-off.between.Adversarial.and.Backdoor.Robustness
pip install -r requirements.txt
The trade-off between adversarial and backdoor robustness given different defenses against adversarial attacks - Adversarial training and its enhancements.
Dataset | Adv. Defense | Accuracy | Adv. Roubustness | Backdoor Success rate |
MNIST | None (Std. Training) | 99.1% | 0.0% | 17.2% |
Adv. Training | 98.8% | 93.4% | 67.2% | |
Lipschitz Reg. | 99.3% | 0.0% | 5.7% | |
Lipschitz Reg. + Adv. Training | 98.7% | 93.6% | 52.1% | |
Denoising Layer | 96.9% | 0.0% | 9.6% | |
Denoising Layer + Adv. Training | 98.3% | 90.6% | 20.8% | |
CIFAR10 | None (Std. Training) | 90.0% | 0.0% | 64.1% |
Adv. Training | 79.3% | 48.9% | 99.9% | |
Lipschitz Reg. | 88.2% | 0.0% | 75.6% | |
Lipschitz Reg. + Adv. Training | 79.3% | 48.5% | 99.5% | |
Denoising Layer | 90.8% | 0.0% | 99.6% | |
Denoising Layer + Adv. Training | 79.4% | 49.0% | 100.0% | |
ImageNet | None (Std. Training) | 72.4% | 0.1% | 3.9% |
Adv. Training | 55.5% | 18.4% | 65.4% | |
Denoising Layer | 71.9% | 0.1% | 6.9% | |
Denoising Layer + Adv. Training | 55.6% | 18.1% | 68.0% |
The trade-off between adversarial and backdoor robustness given different defenses against adversarial attacks - Certified robustness
Dataset | Poisoned Data Rate | Adv. Defense | Accuracy | Certified Robustness | Adv. Roubustness | Backdoor Success rate |
MNIST | 5% | None | 99.4% | N/A | 0.0% | 36.3% |
IBP | 97.5% | 84.1% | 94.6% | 92.4% | ||
CIFAR10 | 5% | None | 87.9% | N/A | 0.0% | 99.9% |
IBP | 47.7% | 24.0% | 35.3% | 100.0% | ||
0.5% | None | 88.7% | N/A | 0.0% | 81.8% | |
IBP | 50.8% | 25.8% | 35.7% | 100.0% |
he performance of the pre-training backdoor defenses that detect and remove poisoned training data.
Dataset | Adv. Defense | Detection Rate (Spectral signatures) | Detection Rate (Activation Clustering) | ||||
5% | 1% | 0.5% | 5% | 1% | 0.5% | ||
CIFAR10 | Dirty-Label Sticker + Std. Training | 81.6% | 24.4% | 2.4% | 100% | 100% | 5.58% |
Clean-Label Sticker + Adv. Training | 50.1% | 10.6% | 5.2% | 48.2% | 9.59% | 5.01% | |
ImageNet | Dirty-Label Sticker + Std. Training | 100% | 84.6% | 100% | 100% | 100% | 100% |
Clean-Label Sticker + Adv. Training | 50.5% | 13.1% | 9.23% | 47.8% | 9.67% | 3.72% |
The performance of the post-training backdoor defense that cleanses neurons.
Dataset | Trigger Type | Trigger Label | Training Algorithm | Success rate w/o Defense | Success rate w/ Defense |
CIFAR10 | Sticker | Dirty | Std. Training | 100% | 0.1% |
Clean | Adv. Training | 99.9% | 0% | ||
Watermark | Dirty | Std. Training | 99.7% | 39.3% | |
Clean | Adv. Training | 92.7% | 1.2% | ||
ImageNet | Sticker | Dirty | Std. Training | 98.1% | 2.3% |
Clean | Adv. Training | 65.4% | 1.1% | ||
Watermark | Dirty | Std. Training | 96.3% | 39.8% | |
Clean | Adv. Training | 49.7% | 4.0% |