Skip to content

Latest commit

 

History

History
45 lines (40 loc) · 2.4 KB

README.md

File metadata and controls

45 lines (40 loc) · 2.4 KB

Image-Captioning

Image Captioning using Recurrent Neural Networks

  • In this project we use deep neural network models to caption Flickr images.
  • The dataset has 8091 images and each image in this dataset has an ID and there are 5 caption for each image.
  • we used pretrained bert model to get the embedings and an LSTM layer for generating.

Model Architecture

|--------------------------------|   | -------------------------------|
|     pictures_input(2048,)      |   |   captions_input(max_length,)  |
|--------------------------------|   | -------------------------------|
                 ↓                                   ↓               
|--------------------------------|   | -------------------------------|
|          Dropout(0.5)          |   |  Embedding(vocab_size, 128)    |
|--------------------------------|   | -------------------------------|
                 ↓                                   ↓
|--------------------------------|   | -------------------------------|
|         Dense(256, relu)       |   |           LSTM(128)            |
|--------------------------------|   | -------------------------------|
                 ↓                                    ↓
|--------------------------------|                    ↓
|          Dropout(0.5)          |                    ↓    
|--------------------------------|                    ↓
                 ↓                                    ↓   
|--------------------------------|                    ↓
|         Dense(256, relu)       |                    ↓
|--------------------------------|                    ↓
                 ↓                                    ↓
| --------------------------------------------------------------------|
|                             Concatenate                             |
| --------------------------------------------------------------------|
                                  ↓
| --------------------------------------------------------------------|
|                          Dense(128, relu)                           |
| --------------------------------------------------------------------|
                                  ↓
| --------------------------------------------------------------------|
|                          Dense(vocab_size, softmax)                 |
| --------------------------------------------------------------------|
                                  ↓