This repository contains Python code for training a Convolutional Neural Network (CNN) model on the MNIST or EMNIST datasets and evaluating its performance under adversarial attacks. The code also includes adversarial training as a defense mechanism against the attacks.
Yocam, E., Alomari, A., Gawanmeh, A., Mansoor, W. (2023). A privacy-preserving system design for digital presence protection. Computers, Materials & Continua, 75(2), 3091-3110. https://doi.org/10.32604/cmc.2023.032826
D. J. Miller, Z. Xiang and G. Kesidis, "Adversarial Learning Targeting Deep Neural Network Classification: A Comprehensive Review of Defenses Against Attacks," in Proceedings of the IEEE, vol. 108, no. 3, pp. 402-433, March 2020, doi: 10.1109/JPROC.2020.2970615. https://ieeexplore.ieee.org/document/9013065
The code implements three common adversarial attacks:
-
FGSM (Fast Gradient Sign Method): FGSM is a fast and simple attack that adds a small perturbation to the input image in the direction of the gradient of the loss function with respect to the input. It is an untargeted attack that aims to fool the model into making incorrect predictions.
-
CW (Carlini-Wagner): The CW attack is an iterative attack that optimizes a loss function to find the smallest perturbation that can cause misclassification. It is a targeted attack, meaning it tries to make the model predict a specific target class different from the true class.
-
PGD (Projected Gradient Descent): PGD is an iterative attack that performs multiple steps of gradient ascent to find adversarial examples. It projects the perturbations onto an epsilon-ball around the original input to ensure that the perturbations are bounded.
These attacks can be combined to create a compounded attack, where multiple attacks are applied sequentially to generate more robust adversarial examples. The code provides three options for compounded attacks: fgsm_cw_attack
, fgsm_pgd_attack
, and cw_pgd_attack
.
The attacks implemented in this code are white-box attacks, meaning they have access to the model's architecture and parameters during the attack generation process.
The code includes adversarial training as a defense mechanism against adversarial attacks. Adversarial training involves augmenting the training dataset with adversarial examples generated using the attacks mentioned above. By training the model on both clean and adversarial examples, the model learns to be more robust and resistant to adversarial perturbations.
- Python version: 3.8.18
- torch: 2.2.1
- torchvision: 0.17.1
- torchattacks: 3.5.1
- numpy: 1.23.5
- tabulate: 0.9.0
The code supports the following datasets:
- MNIST
- EMNIST
- SVHN
- USPS
- Semeion
Make sure to have the dataset files in the appropriate directory structure on your Google Drive.
- Set the runtime to GPU (e.g., T4 GPU - Google Colabs) instead of CPU.
- Mount your Google Drive and ensure that the dataset files are in the correct directory structure.
- Select the desired compounded adversarial attack by setting the
compounded_attack_name
variable. - Set the desired dataset by setting the
dataset_name
variable. - Run the code cells in the provided order.
- Train the clean CNN model on the selected dataset.
- Evaluate the clean model's performance on the test set.
- Perform the selected compounded adversarial attack on the clean model.
- Evaluate the attacked model's performance on the test set.
- Generate an adversarial dataset using the attacked model.
- Perform adversarial training by combining the clean and adversarial datasets.
- Evaluate the adversarially trained model's performance on the test set.
- Perform the adversarial attack on the adversarially trained model.
- Evaluate the performance of the adversarially trained model under attack.
- Summarize and compare the performance metrics and misclassified examples for the clean, attacked, and adversarially trained models.
The code provides a summary of the model's performance at different stages:
- Clean model performance
- Performance under adversarial attack without defense
- Performance under adversarial attack with defense (adversarial training)
The summary includes metrics such as loss, accuracy, precision, recall, F1-score, and ROC AUC score. It also displays examples of misclassified images at each stage CNN Results.
The code assumes that the dataset files are stored on Google Drive and that Google Drive will be mounted with the default directory structure. Make sure to adjust the file paths and directory structure if necessary.
For detailed information on the code and its functionality, please refer to the code comments and the provided Python code.
Disclaimer This repository is intended for educational and research purposes.
Copyright 2024 Eric Yocam
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.