An unofficial implementation for "CoSeR: Bridging Image and Language for Cognitive Super-Resolution (CVPR 2024)"

Cognitive Super-Resolution (CoSeR) is a stable diffusion-based super-resolution (SR) approach that enables SR models to “understand” low-resolution (LR) images.

🔨 Installation

pip install -r requirements.txt

💼 Models

We provide models trained on ImageNet1000 following the settings in orginal paper: Qformer, CoSeR.

🌟 Quick inference

Please download Stable Diffusion 2.1. and revise "PATH_FOR_QFORMER" and "PATH_FOR_SD" in configs/CoSeR/inference.yaml with the corresponding path. We also recommend to use the controllable feature wrapping from StableSR for the complete performance. 🤗

No image tiling, with reference image generation：

python scripts/inference.py \
--inputdir ... \
--outdir ... \
--config configs/CoSeR/inference.yaml \
--load_ckpt PATH_FOR_COSER \
--vqgan_ckpt PATH_FOR_CFW

With image tiling, for high-resolution image reasoning:

python scripts/inference_tile.py \
--inputdir ... \
--outdir ... \
--config configs/CoSeR/inference.yaml \
--load_ckpt PATH_FOR_COSER \
--vqgan_ckpt PATH_FOR_CFW

🎱 Training

Prepare training data:

Following the orginal paper, we process ImageNet1000 into 512*512 size. We selected a subset of 2000 images as the test set.

python data/prepare_imagenet.py

After that, we used the Real-ESRGAN method to generate LRs for the images in the test set.

We use BLIP2 to generate captions for every HR in the last step.

python data/generate_caption.py

Generating ImageNet intra-class similarity (CLIP similarity) which will be used in the training of reference image attention.

python data/count_clip_sim.py

(Stage 1) Training of the cognitive encoder: (please revise "PATH_FOR_GT" and "PATH_FOR_LR" in the yaml)

python main.py --train --base configs/CoSeR/qformer_srresnet_imagenet_all.yaml --gpus 0,1,2,3,4,5,6,7 --name your_name

(Stage 2) Training of CoSeR: (please revise "PATH_FOR_QFORMER", "PATH_FOR_SD", "PATH_FOR_GT" and "PATH_FOR_LR" in the yaml)

python main.py --train --base configs/CoSeR/aia_512_imagenet_all_caption_clip_atten_ref.yaml --gpus 0,1,2,3,4,5,6,7 --name your_name

💙 Acknowledgments

This project is based on StableSR. Thanks for their awesome works.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
basicsr		basicsr
configs		configs
data		data
images		images
ldm		ldm
scripts		scripts
LICENSE.txt		LICENSE.txt
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An unofficial implementation for "CoSeR: Bridging Image and Language for Cognitive Super-Resolution (CVPR 2024)"

🔨 Installation

💼 Models

🌟 Quick inference

🎱 Training

💙 Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

TianheWu/CoSeR

Folders and files

Latest commit

History

Repository files navigation

An unofficial implementation for "CoSeR: Bridging Image and Language for Cognitive Super-Resolution (CVPR 2024)"

🔨 Installation

💼 Models

🌟 Quick inference

🎱 Training

💙 Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages