Image Caption Generation is one of the classic AI problem that uses both domains from NLP and CV making it a really interesting project. Objective of the system is to generate a caption( A one line description) about an Image which is accurate as much as possible. Caption generation is a challenging artificial intelligence problem where a textual description must be generated for a given image input. It requires methods from both computer vision and natural language processing. Computer vision to understand the content and features of the image and natural language processing to turn the understanding of the image into words in the right order. Recently, deep learning methods have achieved state-of-the-art results on examples of this problem.
Technical Report is saved here with all the references
Minimum Requirements
- Python with Keras and other important libraries including tensorflow, numpy et cetera
- 4GB RAM
- Any Operating System would do
- Ipynb editor like Jupyter or Ipython
- Intel i3 7th Gen or above
We would be using Flickr8K_ dataset . As the name suggests the particular dataset contains around 8000 images with around 5 captions per image. The reason is because it is realistic and relatively small so that you can download it and build models on your workstation using a CPU.
The Dataset can be downloaded through the request form at this Dataset Request Form Download the datasets and unzip them into your current working directory. You will have two directories:
- Flickr8k_Dataset: Contains 8092 photographs in JPEG format.
- Flickr8k_text: Contains a number of files containing different sources of descriptions for the photographs.
The dataset has a pre-defined training dataset (6,000 images), development dataset (1,000 images), and test dataset (1,000 images).
Model Main Summary CHECK THE REPORT FOR DETAILED DESCRIPTION
You can easily implement the project locally on your system easily with the following Steps:
- Download the Dataset that is linked in the README
- Now try using clone method to clone this repository in your local system
- Before implementing see to it that all the path variables are set correctly
- After setting everything up , You can run the cells
- Your program would run and generate the needed outputs
Check out my other repos as well. Enjoy and be Safe