Skip to content
/ CoSeR Public

An unofficial implementation for "CoSeR: Bridging Image and Language for Cognitive Super-Resolution (CVPR 2024)"

License

Notifications You must be signed in to change notification settings

TianheWu/CoSeR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An unofficial implementation for "CoSeR: Bridging Image and Language for Cognitive Super-Resolution (CVPR 2024)"

Cognitive Super-Resolution (CoSeR) is a stable diffusion-based super-resolution (SR) approach that enables SR models to “understand” low-resolution (LR) images.

🔨 Installation

pip install -r requirements.txt

💼 Models

We provide models trained on ImageNet1000 following the settings in orginal paper: Qformer, CoSeR.

🌟 Quick inference

Please download Stable Diffusion 2.1. and revise "PATH_FOR_QFORMER" and "PATH_FOR_SD" in configs/CoSeR/inference.yaml with the corresponding path. We also recommend to use the controllable feature wrapping from StableSR for the complete performance. 🤗

  • No image tiling, with reference image generation:
python scripts/inference.py \
--inputdir ... \
--outdir ... \
--config configs/CoSeR/inference.yaml \
--load_ckpt PATH_FOR_COSER \
--vqgan_ckpt PATH_FOR_CFW
  • With image tiling, for high-resolution image reasoning:
python scripts/inference_tile.py \
--inputdir ... \
--outdir ... \
--config configs/CoSeR/inference.yaml \
--load_ckpt PATH_FOR_COSER \
--vqgan_ckpt PATH_FOR_CFW

🎱 Training

  1. Prepare training data:
  • Following the orginal paper, we process ImageNet1000 into 512*512 size. We selected a subset of 2000 images as the test set.
python data/prepare_imagenet.py

After that, we used the Real-ESRGAN method to generate LRs for the images in the test set.

  • We use BLIP2 to generate captions for every HR in the last step.
python data/generate_caption.py
  • Generating ImageNet intra-class similarity (CLIP similarity) which will be used in the training of reference image attention.
python data/count_clip_sim.py
  1. (Stage 1) Training of the cognitive encoder: (please revise "PATH_FOR_GT" and "PATH_FOR_LR" in the yaml)
python main.py --train --base configs/CoSeR/qformer_srresnet_imagenet_all.yaml --gpus 0,1,2,3,4,5,6,7 --name your_name
  1. (Stage 2) Training of CoSeR: (please revise "PATH_FOR_QFORMER", "PATH_FOR_SD", "PATH_FOR_GT" and "PATH_FOR_LR" in the yaml)
python main.py --train --base configs/CoSeR/aia_512_imagenet_all_caption_clip_atten_ref.yaml --gpus 0,1,2,3,4,5,6,7 --name your_name

💙 Acknowledgments

This project is based on StableSR. Thanks for their awesome works.

About

An unofficial implementation for "CoSeR: Bridging Image and Language for Cognitive Super-Resolution (CVPR 2024)"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages