Skip to content

Fooling Machine Learning Models: A Novel Out-of-Distribution Attack through Generative Adversarial Networks

License

Notifications You must be signed in to change notification settings

HailongHuPri/OODGANAttack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OODGANAttack

This repository provides the implementation of the research paper "Fooling Machine Learning Models: A Novel Out-of-Distribution Attack through Generative Adversarial Networks'' .

In this repository, we introduce a novel out-of-distribution (OOD) attack: Leveraging pre-trained generative adversarial networks (GANs), an adversary aims to fool an ML model and make the model misclassify a sample from GANs as a pre-specified target class.

Ethical Consideration

We would like to clarify that the intent behind sharing this out-of-distribution attack method is not malicious. Our primary goal is to advance the development of robust defensive measures for machine learning systems. In fact, this method can serve as a critical tool for conducting both white-box and black-box testing, facilitating the safe deployment of machine learning models in real-world GenAI scenarios. We encourage the use of these techniques to enhance system security and to prepare machine learning models against potential adversarial threats.

Table of Contents

Environment

We recommend using Anaconda to manage the python environment. In this work, we use conda version 4.13.0, and install following pacages in the virtual enviroment oodganattack.

conda create -n oodganattack python=3.6.13
conda activate oodganattack 

pip install tensorflow-gpu==2.5.0
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

pip install jupyter
pip install ipykernel
pip install ipyparallel
pip install jupyter_server
python -m ipykernel install --user --name oodganattack 

pip install scikit-learn==0.24.1
pip install tensorboard-logger==0.1.0

Dataset Preparation

  1. Download and unzip datasets: GTSRB, CIFAR-10 (python version).

  2. Prepare our dataset through data_process.py.

data_process.py:
    --train_img_dir: Path for the training data folder.
    --test_img_dir: Path for the test data folder.
    --h5py_file: Path to save processed data as HDF5.
    --data_name: Specify which dataset to process: "cifar10" or "gtsrb43". 

    The output of data_process.py is a .h5py file.

Victim Model Preparation

  1. Enter the victim_modelsfolder.

  2. Train a victim model via raw_victim_models.py

raw_victim_models.py
    --results_dir: Directory to store results.
    --data_dir: Directory where dataset files are located.
    --dataset: 'cifar10' or 'gtsrb43'.
    --model_type: 'wideresnet' or 'densenet'.
    --model_mode: 'train'.
    --max_epoch: Maximum number of training epochs.
  1. Evaluate a victim model via raw_victim_models.py
raw_victim_models.py
    --data_dir: Directory where dataset files are located.
    --model_mode: 'test'.
    --victim_model_type: 'wideresnet' or 'densenet'.
    --victim_model_path: Path for loading victim models.
    --dataset: 'cifar10' or 'gtsrb43'.

Additionally, we offer a pre-trained victim model WideResNet on CIFAR-10 as an illustrative example.

Out-of-distribution Attacks

Given a pre-trained GAN model and a target victim model, we can perform an OOD attack: this manipulates the victim model into misclassifying samples generated by the GAN as a specific target class chosen by the adversary.

We offer a pre-trained GAN model StyleGAN on FFHQ. Alternatively, we can also train a GAN model by ourselves via this code base StyleGAN.

  1. Enter the attacking_victim_models/oodganattack_rawfolder.

Before executing the attack, specify the path to your pre-trained GAN model in attacking_victim_models/oodganattack_raw/gan_models/model_settings.py. Set MODEL_DIR='the path to the pre-trained GAN model'. Ensure the PyTorch model is placed in the designated pytorch folder.

  1. Run the OODGAN attack via attack_raw.py
attack_raw.py
    --results_dir: Path to save results of the attack.
    --gan_model: The name of the model used.
    --victim_images_path: Directory path where victim images are stored.
    --victim_dataset_name: Name of the dataset used for the victim model.
    --victim_model_path: File path to the pre-trained victim model.
    --victim_model: Type of victim model to target. 
    --attack_type: Type of attack: white-box (wb) or black-box (bb).
    --early_stopping: Whether to stop early if convergence is reached.
    --req_conf: Required confidence score. 0 indicates no requirement. 

The output of attack_raw.py is a .h5py file which contains attack results.

  1. Enter the folder attacking_victim_models/test_attack_performance and evaluate the attack performance via raw_model_attack_performance.py
raw_model_attack_performance.py
    --OOD_adv_data_path: Path to the out-of-distribution adversarial data.
    --victim_dataset_name: Name of the dataset used for the victim model.
    --victim_model_path: File path to the pre-trained victim model.
    --victim_model: Type of victim model to target. 
    --attack_type: Type of attack: white-box (wb) or black-box (bb).

The reported results include average Attack Success Rate (ASR), worst ASR and best ASR.

Take the victim model WideResNet trained on CIFAR-10 as an example, we present them in Example.ipynb.

Acknowledgements

Our implementation uses the source code in these repositories: StyleGAN, ATOM, mGANprior.

About

Fooling Machine Learning Models: A Novel Out-of-Distribution Attack through Generative Adversarial Networks

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published