This is an excercise of Image Captioning, as a part of Udacity Comuputer Vision Nanodegree Program.
Image captioning is to attach a short descriptiong sentence to a image. This tries to automatically generate the sentence by loading images.
Used COCO dataset (http://cocodataset.org)
The network is as below. This is an encoder-decoder structure. Encoder part is a pre-trained CNN(ResNet), and provides an embedded vector that decribes the features of images. Decoder part is RNN (LSTM) and transforms the features into word vector.
This simple network surprisingly works well.