Image-captioning-with-GPT2

This repo implements an image captoning model. GPT2 and CLIP are finetuned on flickr8k dataset. Pretrained CLIP model is used as image encoder and GPT2 is used as text decoder. Self attention layers of GPT2 are replaced with randomly initiliazed cross-attention layers in order to feed image vector coming from CLIP. The results of the resulting model are not great. This is due to low amount of data in flicker8k dataset. The model often makes mistakes but it captures the overall scene. You can check the inference notebook to see thre results. The goal of this repo is to demonstrate how to finetune GPT2 and CLIP to create an image captioning model. The notebooks are intended to run on Colab.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Image_captioning.ipynb		Image_captioning.ipynb
Inference_of_Image_captioning.ipynb		Inference_of_Image_captioning.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image-captioning-with-GPT2

About

Releases

Packages

Languages

kesimeg/Image-captioning-with-GPT2

Folders and files

Latest commit

History

Repository files navigation

Image-captioning-with-GPT2

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages