Cognitive Super-Resolution (CoSeR) is a stable diffusion-based super-resolution (SR) approach that enables SR models to “understand” low-resolution (LR) images.
pip install -r requirements.txt
We provide models trained on ImageNet1000 following the settings in orginal paper: Qformer, CoSeR.
Please download Stable Diffusion 2.1. and revise "PATH_FOR_QFORMER" and "PATH_FOR_SD" in configs/CoSeR/inference.yaml with the corresponding path. We also recommend to use the controllable feature wrapping from StableSR for the complete performance. 🤗
- No image tiling, with reference image generation:
python scripts/inference.py \
--inputdir ... \
--outdir ... \
--config configs/CoSeR/inference.yaml \
--load_ckpt PATH_FOR_COSER \
--vqgan_ckpt PATH_FOR_CFW
- With image tiling, for high-resolution image reasoning:
python scripts/inference_tile.py \
--inputdir ... \
--outdir ... \
--config configs/CoSeR/inference.yaml \
--load_ckpt PATH_FOR_COSER \
--vqgan_ckpt PATH_FOR_CFW
- Prepare training data:
- Following the orginal paper, we process ImageNet1000 into 512*512 size. We selected a subset of 2000 images as the test set.
python data/prepare_imagenet.py
After that, we used the Real-ESRGAN method to generate LRs for the images in the test set.
- We use BLIP2 to generate captions for every HR in the last step.
python data/generate_caption.py
- Generating ImageNet intra-class similarity (CLIP similarity) which will be used in the training of reference image attention.
python data/count_clip_sim.py
- (Stage 1) Training of the cognitive encoder: (please revise "PATH_FOR_GT" and "PATH_FOR_LR" in the yaml)
python main.py --train --base configs/CoSeR/qformer_srresnet_imagenet_all.yaml --gpus 0,1,2,3,4,5,6,7 --name your_name
- (Stage 2) Training of CoSeR: (please revise "PATH_FOR_QFORMER", "PATH_FOR_SD", "PATH_FOR_GT" and "PATH_FOR_LR" in the yaml)
python main.py --train --base configs/CoSeR/aia_512_imagenet_all_caption_clip_atten_ref.yaml --gpus 0,1,2,3,4,5,6,7 --name your_name
This project is based on StableSR. Thanks for their awesome works.