Neural Caption Generator

Tensorflow implementation of "Show and Tell" in the paper: http://arxiv.org/abs/1411.4555. The Show and Tell model is a deep neural network that learns how to describe the content of images.
Borrowed code and ideas from jazzsaxmafia's show_and_tell.tensorflow: https://github.com/jazzsaxmafia/show_and_tell.tensorflow. There are some modifications in model.py, see Code for details.
You need flickr30k data (images and annotations). You can put those in ImageCaption/data and ImageCaption/images folder respectively.

Install Required Packages

First ensure that you have installed the following required packages:

TensorFlow0.10.0rc0 (instructions)
Caffe (instructions)
Keras1.2.1 (instructions)
Natural Language Toolkit (NLTK):
- First install NLTK (instructions)
- Then install the NLTK data (instructions)

See requirements.txt for details.

Code

make_flickr_dataset.py : Extracting feats of flickr30k images, and save them in './data/feats.npy'.
- First, you shoule download the caffemodel and deploy.prototxt of VGG19. You can download those from here.
model.py : TensorFlow Version. There are some modifications in model.py:
- Add some command arguments, run more convenient.
- The test_single() in model.py is for a single image. If use_flickr=False, it just generate the caption of a image; If use_flickr=True, it will randomly pick a image and respective five reference captions from flickr30k dataset, generate the caption and calculate the BLEU Score.
- The test_multiple() in model.py is for multiple images. If use_flickr=False, it just generate the captions of some images; If use_flickr=True, it will randomly pick some images and respective five reference captions from flickr30k dataset, generate the captions and calculate the BLEU Scores.

Getting Started

Training a Model Run the training script.

python model.py --phase train

The checkpoint data will be stored in the model/tensorflow folder periodically.

Generating Captions and/or not Calculate BLEU Scores Your trained Show and Tell model can generate captions for any JPEG/PNG image! The following command line will generate captions for an image or some images.

python model.py --phase test_single --use_flickr False
python model.py --phase test_single --use_flickr True
# The script will generate the caption and/or not calculate the BLEU Score.
python model.py --phase test_multiple --use_flickr False
python model.py --phase test_multiple --use_flickr True
# The script will generate the captions and/or not calculate the BLEU Scores.

Downloading data/trained model

You might want to download flickr30k dataset(images and annotations) from here.
Extraced FC7 data: download. This is used in train() function in model.py. You can skip feature extraction part by using this.
Pretrained model: download. This is used in test_single() and test_multiple() in model.py. If you just want to check out captioning, download and test the model.
Tensorflow VGG net: download. This file is used in test_single() and test_multiple() in model.py.

License

BSD license

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
images_file		images_file
models/tensorflow		models/tensorflow
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cnn_util.py		cnn_util.py
make_flickr_dataset.py		make_flickr_dataset.py
model.py		model.py
requireme.txt		requireme.txt
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Caption Generator

Install Required Packages

Code

Getting Started

Downloading data/trained model

License

About

Releases

Packages

Languages

License

lyatdawn/Show-and-Tell

Folders and files

Latest commit

History

Repository files navigation

Neural Caption Generator

Install Required Packages

Code

Getting Started

Downloading data/trained model

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages