This project is a Flask web application that allows users to upload images and generate captions for them using a custom AI model. The model utilizes EfficientNet for the Convolutional Neural Network (CNN) component, a custom Long Short-Term Memory (LSTM) network, and a multihead attention layer. The model has an accuracy of 42%.
Image Captioning Project
│
├── app.py
├── prediction.py
├── templates
│ └── index.html
├── static
│ ├── css
│ ├── images
│ └── media
├── Models
│ ├── model.h5
│ └── tokenizer.json
├── test images
└── model_training.pynb
- app.py: Main Flask application file.
- prediction.py: Contains the logic for image captioning using the AI model.
- templates/index.html: HTML template for the main page.
- static/css: Directory for CSS files.
- static/images: Directory for image files.
- static/media: Directory for media files.
- Models: Directory containing the pre-trained model and tokenizer.
- test images: Directory containing test images.
- model_training.pynb: Jupyter notebook containing the code for training the AI model.
- Python 3.8 or higher
- Pip (Python package installer)
- Jupyter Notebook (for running
model_training.pynb
)
-
Clone the repository:
git clone https://github.com/harshit433/Image-Captioning-Cantilever-.git cd Image Captioning Project
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
If requirements.txt
is not already present, you can generate it with the following command after installing the necessary packages:
pip freeze > requirements.txt
-
Start the Flask application:
python app.py
-
Open your browser and go to:
http://127.0.0.1:5000/
To train the model, open the model_training.pynb
file in Jupyter Notebook and run the cells. This notebook contains the code for training the AI model using EfficientNet for the CNN component, a custom LSTM network, and a multihead attention layer.
- Upload an Image: Click on the "Choose File" button to select an image from your computer.
- Generate Caption: After selecting the image, click on the "Upload" button to generate a caption for the image.
- View Result: The generated caption and the uploaded image will be displayed on the same page.
This is the main Flask application file which handles the web server, routes, and the logic for handling image uploads and generating captions.
This file contains the core logic for generating captions using a custom AI model. The model utilizes EfficientNet for the CNN component, a custom LSTM network, and a multihead attention layer. The model and tokenizer are loaded from the Models
directory.
This Jupyter notebook contains the code for training the AI model. It includes data preprocessing, model architecture, training loop, and evaluation metrics.
This HTML file serves as the front-end for the application where users can upload images and view the generated captions.
- css: This directory is intended for any CSS files needed for styling the web pages.
- images: This directory can be used to store images used in the project.
- media: This directory is used to store the user uploaded images.
This directory contains the pre-trained model (model.h5
) and the tokenizer (tokenizer.json
) used for generating captions.
This directory can be used to store images for testing the application.
- CNN Component: Utilizes EfficientNet for feature extraction from images.
- LSTM Network: A custom LSTM network is used for sequence generation.
- Multihead Attention Layer: Enhances the model's ability to focus on different parts of the image when generating captions.
- Accuracy: The model has an accuracy of 42%.
Feel free to fork this repository and make your changes. Pull requests are welcome.