Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional comparisons to Tiled DDPM, ControlNet Tile, Loopback Scaler and DeepFloyed. #2

Closed
UIUC-Marisa3 opened this issue May 12, 2023 · 11 comments
Labels
good first issue Good for newcomers

Comments

@UIUC-Marisa3
Copy link

Hello, thanks for the work! We see many classic SR methods in the paper. The comparison to Real-ESRGAN+ looks promising!

However, it seems that the paper wants to claim that “our method using both synthetic and real world benchmarks demonstrates its superiority over current state-of-the-art approaches”. Just wondering would we have some comparisons to some real baselines and more common methods that people actually use?

For example:

Tiled diffusion’s DDIM inversion:
https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111

ControlNet Tile’s updates yesterday (looks like they are going to use this SR-like model to compete MidjourneyV5/5.1 in image details):
https://github.com/lllyasviel/ControlNet-v1-1-nightly#ControlNet-11-Tile

Loopback Scaler:
https://civitai.com/models/23188/loopback-scaler

DeepFloyd’s 256 stage model (IF-III-L):
https://github.com/deep-floyd/IF

Some of these methods are likely to use prompts, yet it seems that getting a prompt from small image is trivial for BLIP, and all ControlNets have a ‘guessmode’ that can use empty string as prompts. Loopback Scaler and Tiled diffusion seem to suggest people always using same string as prompts whatever the image is so they actually do not require prompts.

Most of these methods can be easily used by installing a latest version of automatic1111.

@pkuliyi2015
Copy link

Yes, I also want a visual comparison.

If your method is competitive (For example if you can upscale to 4k images like the controlnet tile model), I will be happy to migrate your method to the automatic1111.

By the way I'm also studying in NTU. We may have opportunity to cooperate!

@IceClear
Copy link
Owner

IceClear commented May 13, 2023

Hi, thanks for your interests of our work!
We currently do not compare StableSR with these open-sourced demos in our paper due to the following reasons:
(1) These open-sourced demos are not academic papers formally accepted by conferences or journals after official reviews.
(2) Our current released code and paper were finished around March, though just publicly available. And we did not notice these demos then.

We appreciate your valuable advice and we will go through these demos later.
We will provide visual comparisons soon :)
BTW, we would revise the title of the issue for easy understanding.

Next, we will compare with these baselines one by one.

@IceClear IceClear changed the title Comparison to commonly used method? Additional comparisons to Tiled DDPM, ControlNet Tile, Loopback Scaler and DeepFloyed. May 13, 2023
@IceClear IceClear pinned this issue May 13, 2023
@IceClear
Copy link
Owner

IceClear commented May 13, 2023

Comparison with Tiled DDPM:
We first test on the image from the commonly used real-world test set here. For Tiled DDPM, we use the same pretrained diffusion model as StableSR (v2-1_512-ema-pruned.ckpt) and follow most of the settings provided by Tiled DDPM. We use large sampling steps for better performance, the prompts are the same as Tiled DDPM:
image

Result of Tiled DDPM:
00018-1228422786

Result of StableSR:
tiger

We observe that Tiled DDPM tends to be struggling with fidelity as well as the quality in real-world cases.

@IceClear
Copy link
Owner

IceClear commented May 13, 2023

We further show an example on AIGC SR, though StableSR is not for AIGC and never see such type of data during training. We directly test on the image provided by Tiled DDPM, the generated image is in 4K resolution:
StableSR result
Comparison with Zoomed LR
StableSR shows better fidelity compared with the result of Tiled DDPM.

@pkuliyi2015
Copy link

pkuliyi2015 commented May 13, 2023

Thanks for your effort in testing.

It seems that your model is compatible with my tiled diffusion method (that is only tiling, no advanced algorithm involved). Would you mind me migrating your model to the Automatic1111?

Or if you want to start the project on your own, I may be able to help.

@IceClear
Copy link
Owner

IceClear commented May 13, 2023

Thanks for your effort in testing.

It seems that your model is compatible with my tiled diffusion method (that is only tiling, no advanced algorithm involved). Would you mind me migrating your model to the Automatic1111?

Or if you want to start the project on your own, I may be able to help.

Hi~ Thanks for your interest.
I am OK with that. Automatic1111 is a popular repo and we are glad to see that our research can contribute to practical use.
Just remember to include our license : )

Honestly, the main purpose of this paper is just to attempt to make contributions to the research community, even if the contributions may be tiny.
We do not mean to list and 'K.O.' all the other baselines in the world.
StableSR is good but not perfect, and we appreciate suggestions and efforts that can make StableSR better.

@wo262
Copy link

wo262 commented May 15, 2023

StableSR is so far the best identity preserving scaling method out there. Meaning if you downscale it back to its original res, each pixel should average back to it's original value and it shouldn't make up features larger than the pixels. While the new details should look plausible and not like a mere filter.

Comparison between StableSr minus base image, and TiledDDPM minus base image using the highres image provided in @pkuliyi2015 's github page
stableSR_difference
TiledDD_difference

@IceClear
Copy link
Owner

IceClear commented May 15, 2023

For the comparison with ControlNet Tile. It seems it is still in updating and not fully included in A1111. The gradio demo they provided currently does not support upscaling in tiles. And unfortunately, I am not familiar with gradio and failed to build it in A1111 after trying for two days. So I just skip this comparison.
However, from the results they showed in readme, I conjecture the fidelity of the results may not be very good and whether ControlNet Tile can be directly applied for real-world images with unknown degradation is also a question.
BTW, our StableSR has been fully released and anyone interested in it is welcomed to conduct the comparison : )

@IceClear
Copy link
Owner

Comparison with Loopback Scaler:
I ran Loopback Scaler on A1111 and it reports "NAN error" on the tiger image using above and I did not figure out the reason.
However, I managed to run it on another example from the internet:
2684559-PH
I use the same prompt as used in Tiled DDPM.
I use the same pretrained diffusion model as StableSR (v2-1_512-ema-pruned.ckpt) and other settings are shown below:
Screenshot 2023-05-15 183302

Result of Loopback Scaler:
img2img-0001-1472149575

Result of StableSR:
2684559-PH

Similarly, we observe that Loopback Scaler has inferior performance in this real-world case.

@IceClear
Copy link
Owner

IceClear commented May 15, 2023

Comparison with DeepFloyd:
I use the stage 3 model for 4x upsampling.
I use the same prompts as in the above test and the noise level is set to 100 as default.

Result of DeepFloyd:
if_stage_III

Obviously, it is still mainly a fidelity issue, while the quality of some detailed textures are also not as good as StableSR.

@IceClear
Copy link
Owner

Conclusion: As observed in the comparisons above, our StableSR significantly differs from the above diffusion-based upscalers with higher fidelity, which is also the main challenge of applying diffusion prior for SR as discussed in our paper.
We think the comparisons are not mainly about which method is the best, they just indicate that we focus on different applications.

Specifically, the above upscalers still focus on 'creation', and they mainly handle AIGC images whose degradation is different from real-world images captured by cameras. Hence, they mainly care about generation quality, which means generating new content in the upscaled results is allowed.
However, for real-world image SR, fidelity is very important and existing methods such as RealESRGAN+ and LDM are actually common methods that people often use. Our StableSR mainly focuses on this direction and we attempt to keep the high fidelity using several strategies introduced in our paper.

We believe this is not the end, but the beginning to explore the powerful ability of diffusion models for image restoration.

@IceClear IceClear added the good first issue Good for newcomers label May 15, 2023
@IceClear IceClear unpinned this issue Mar 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants