Skip to content

Image captioning model & application based on transformers.

Notifications You must be signed in to change notification settings

Devnetly/image-captioning

Repository files navigation

Image Captioning with Transformer-Based Architecture

Overview

This project is part of a university course on Natural Language Processing (NLP). The objective is to develop a model that can generate captions for images using a transformer-based architecture. We have utilized a Data-efficient Image Transformer (DeiT) for the encoder and a standard transformer decoder. The Flickr30k dataset was used for training, with additional text preprocessing and image resizing and augmentation techniques applied,for more details see : Report.

drawing

Members

Setup

To set up the project, please follow these steps:

git clone git@github.com:Devnetly/image-captioning.git
cd image-captioning
conda create -n automatic-image-captioning
conda activate automatic-image-captioning
pip install -r requirements.txt

Once the envirement is ready, run the initialize.py to split the dataset and create the vocabulary,initialy the /data folder structure should be like this :

data
└── flickr30k
    ├── captions.csv
    ├── images
    │   └── 0.jpg
    │   └── 1.jpg
    │   ⋮
    │   └── n.jpg

Then run the command :

python initialize.py --dataset {flickr30k} [--min-freq MIN_FREQ]

And the folder structure should then become something similar to the one below:

data
└── flickr30k
    ├── captions.csv
    ├── images
    │   └── 0.jpg
    │   └── 1.jpg
    │   ⋮
    │   └── n.jpg
    ├── test_captions.csv
    ├── train_captions.csv
    └── vocab.pkl

Training

To train the model,follow the steps below :

cd src/training
python train.py [-h] --dataset {flickr30k} [--batch-size BATCH_SIZE] [--learning-rate LEARNING_RATE] [--weight-decay WEIGHT_DECAY] [--epochs EPOCHS] [--num-workers NUM_WORKERS] [--prefetch-factor PREFETCH_FACTOR] --weights-folder WEIGHTS_FOLDER --histories-folder HISTORIES_FOLDER

Inference

To generate caption for a set of images in folder,follow these steps :

cd src/inference
python inference.py [-h] [--dataset {flickr30k}] [--model {transformer}] --checkpoint CHECKPOINT [--source SOURCE] --destination DESTINATION

Run the associated app

To run the app associated with the project :

cd app
streamlit run main.py