- Tensorflow implementation of "Show and Tell" in the paper: http://arxiv.org/abs/1411.4555. The Show and Tell model is a deep neural network that learns how to describe the content of images.
- Borrowed code and ideas from jazzsaxmafia's show_and_tell.tensorflow: https://github.com/jazzsaxmafia/show_and_tell.tensorflow. There are some modifications in model.py, see Code for details.
- You need flickr30k data (images and annotations). You can put those in
ImageCaption/data
andImageCaption/images
folder respectively.
First ensure that you have installed the following required packages:
- TensorFlow0.10.0rc0 (instructions)
- Caffe (instructions)
- Keras1.2.1 (instructions)
- Natural Language Toolkit (NLTK):
- First install NLTK (instructions)
- Then install the NLTK data (instructions)
See requirements.txt for details.
- make_flickr_dataset.py : Extracting feats of flickr30k images, and save them in './data/feats.npy'.
- First, you shoule download the caffemodel and deploy.prototxt of VGG19. You can download those from here.
- model.py : TensorFlow Version. There are some modifications in model.py:
- Add some command arguments, run more convenient.
- The test_single() in model.py is for a single image. If use_flickr=False, it just generate the caption of a image; If use_flickr=True, it will randomly pick a image and respective five reference captions from flickr30k dataset, generate the caption and calculate the BLEU Score.
- The test_multiple() in model.py is for multiple images. If use_flickr=False, it just generate the captions of some images; If use_flickr=True, it will randomly pick some images and respective five reference captions from flickr30k dataset, generate the captions and calculate the BLEU Scores.
- Training a Model Run the training script.
python model.py --phase train
The checkpoint data will be stored in the model/tensorflow folder periodically.
- Generating Captions and/or not Calculate BLEU Scores Your trained Show and Tell model can generate captions for any JPEG/PNG image! The following command line will generate captions for an image or some images.
python model.py --phase test_single --use_flickr False
python model.py --phase test_single --use_flickr True
# The script will generate the caption and/or not calculate the BLEU Score.
python model.py --phase test_multiple --use_flickr False
python model.py --phase test_multiple --use_flickr True
# The script will generate the captions and/or not calculate the BLEU Scores.
- You might want to download flickr30k dataset(images and annotations) from here.
- Extraced FC7 data: download. This is used in train() function in model.py. You can skip feature extraction part by using this.
- Pretrained model: download. This is used in test_single() and test_multiple() in model.py. If you just want to check out captioning, download and test the model.
- Tensorflow VGG net: download. This file is used in test_single() and test_multiple() in model.py.
- BSD license