Finetuning Pretrained CLIP using DAMSM and Constrastive Loss for text to image synthesis

1. Methodology

Neural network for Text to Image generation is composed of 2 sub-networks.

Text Encoder and Generator Network

Therefore, It requires two-step training to train text-to-image generator.

Image Encoder and Text Encoder are jointly pretrained from image-caption pair thereby projecting image and text to common space.
After text encoder pretraining, Generator Network is advarsarialy trained to generate realistic image based on text feature.

Recent research proposed using DAMSM loss + Contrastive loss for pretraining text encoder and training DM-GAN, thereby reaching SOTA.

In this work, We replaced RNN based text encoder and CNN based image encoder with CLIP, which is pretrained multimodal Vision Language Model based on transformer architecture.

2. CLIP

CLIP is multimodal encoder for image and natural language, which is pretrained using contrastive loss with huge batch size(=32768).

This is link for paper and official pytorch implementation of CLIP

3. Prepared Data

Download the preprocessed datasets from AttnGAN

Alternatively, another site is from DM-GAN

4. Trained model

5. Training

Fine tuning pretrained CLIP encoder

With CUBS2011 using DAMSM + Contrastive loss : $ python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0
With COCO2014 using DAMSM + Contrastive loss : $ python pretrain_DAMSM.py --cfg cfg/DAMSM/coco.yml --gpu 0

Training DM-GAN

With CUBS2011 : $ python main.py --cfg cfg/clip_bird_DMGAN.yml --gpu 0
With COCO2014 : $ python main.py --cfg cfg/clip_coco_DMGAN.yml --gpu 0

6. Evaluation

Generate fake images and compute R precision

CUBS2011 : $ python main.py --cfg cfg/eval_clip_bird.yml
COCO2014 : $ python main.py --cfg cfg/eval_clip_coco.yml

Compute FID(Frechet Inception Distance)

CUBS2011 : $ python fid_score.py --data bird --dims 2048 --batch_size 32
COCO2014 : $ python fid_score.py --data coco --dims 2048 --batch_size 32

Compute Inception score

CUBS2011 : $ python inception_score.py --data bird
COCO2014 : $ python inception_score.py --data coco

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
DMGAN+CLIP		DMGAN+CLIP
data		data
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finetuning Pretrained CLIP using DAMSM and Constrastive Loss for text to image synthesis

1. Methodology

2. CLIP

3. Prepared Data

4. Trained model

5. Training

6. Evaluation

7. Citation

About

Releases

Packages

Contributors 2

Languages

dgjun32/T2I_CLIP-GAN

Folders and files

Latest commit

History

Repository files navigation

Finetuning Pretrained CLIP using DAMSM and Constrastive Loss for text to image synthesis

1. Methodology

2. CLIP

3. Prepared Data

4. Trained model

5. Training

6. Evaluation

7. Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages