- This project focuses on creating adversarial examples to attack unknown image-captioning models and test their robustness.
- Our code built on python 3.6 and PyTorch 1.10. The list of dependencies are in the environment.yml file.
- The code does not working with python 3.7 and above, however, it can be refactored to work with python 3.7+ by changing the image manipulation library from scipy.misc.imread to other newer libraries such as imageio.
- Disclaimer
- Installing Prerequisites
- Getting Data
- Data Processing
- Training
- Evaluating
- Generating Adversarial Examples
- Attacking CLIP Prefix Captioning Model with the Adversarial Examples
- We want to thank sgrvinod's repository A PyTorch Totorial to Image Captioning for his comprehensive image-captioning tutorial.
- We want to thank rmokady's repository CLIP Prefix Captioning for creating image-captioning model from CLIP.
Both these projects incredibly inspired us to work on this topic.
Make sure you have conda installed.
Create new conda environment using provided environment.yml*
Optional: You can change environment name by editing the first line in environment.yml from adv_caption
to your preference name.
conda env create -f /path_to_your_file/environment.yml file.
Then run:
conda activate adv_caption
To activate the conda environment you have just created.
This repository supports working with MSCOCO2014 dataset and Flickr8K dataset.
- IF you choose to work with MSCOCO2014, download training, validation
- IF you choose to work with Flickr8k, the dataset can be requested here.
- Download captions of the images created by Andrej Karpathy and Li Fei-Fei in JSON blobs format here.
- In the fifth line of
params_class.py
, specify data path to your working directory e.g.,data_path = "/[dir_name]/data/"
- Your Karpathy's JSON files should be extracted to the same directory i.e.,
/[dir_name]/data/caption_datasets/
- If you choose to work with MSCOCO2014, your images folder should look like
/[dir_name]/data/images/coco2014/train2014/
for train2014.zip,/[dir_name]/data/images/coco2014/val2014/
for val2014.zip, and/[dir_name]/data/images/coco2014/test2014/
for test2014.zip - If you choose to work with Flickr8K, your images folder should look like
/[dir_name]/data/images/flickr8k/
From now on, don't forget to run every command inside your conda environment with python3.6 installed. For MSCOCO2014 dataset, run:
python create_input_files.py --which_data="coco2014"
For Flick8K dataset, run:
python create_input_files.py --which_data="flickr8k"
Check training options:
python train_args.py -h
To begin training, you must specify
- which model you want to use between resnet50, resnet101, and resnet152.
- which dataset you want to use between coco2014 and flickr8k.
- begin finetuning from scratch between True and False, select False if you want to continue training from your saved model.
- Finetune your model encode between True and False.
python train_args.py --which_model="resnet101" --which_data="coco2014" --start_from_scratch="True" --fine_tune_encoder="True"
Once you have completed training for at least one epoch, a model checkpoint will be saved at /[dir_name]/data/checkpoints/
.
To evaluate your model, run:
python eval_args.py --which_model="resnet101" --which_data="coco2014"
To generate caption of an image, run:
python caption_args.py --which_model="resnet101" --which_data="coco2014" --img="[path_to_the_image]"
You will see the path to output image after the image has been successfully captioned.
To generate adversarial examples from images in test set, run:
python attack_args.py --which_model="resnet101" --target_model="resnet101" --which_data="coco2014" --epsilon=0.004 --export_caption="True" --export_original_image="True" --export_perturbed_image="True"
- If you did not use our
environment.yml
to install dependencies, you must install CLIP module and transformer module first. Before running the following command, make sure condaadv_caption
environment is still activated.
pip install git+https://github.com/openai/CLIP.git
pip install transformers~=4.10.2
- If you work with MSCOCO dataset, download pre-trained COCO model for CLIPcap here. Place your downloaded file(s) inside
checkpoints
folder i.e.,/[dir_name]/data/checkpoints/coco_weights.pt
. - If you work with Flickr8k dataset, download pre-trained conceptual captions for CLIPcap here. Place your downloaded file(s) inside
checkpoints
folder i.e.,/[dir_name]/data/checkpoints/conceptual_weights.pt
.
After you have generated adversarial sample, installed dependencites, and downloaded pre-trained model, you can begin testing CLIPcap robustness by running:
python attack_clipcap_eval.py --which_model="resnet101" --which_data="coco2014" --epsilon=0.004