Additional comparisons to Tiled DDPM, ControlNet Tile, Loopback Scaler and DeepFloyed. #2

UIUC-Marisa3 · 2023-05-12T17:31:38Z

Hello, thanks for the work! We see many classic SR methods in the paper. The comparison to Real-ESRGAN+ looks promising!

However, it seems that the paper wants to claim that “our method using both synthetic and real world benchmarks demonstrates its superiority over current state-of-the-art approaches”. Just wondering would we have some comparisons to some real baselines and more common methods that people actually use?

For example:

Tiled diffusion’s DDIM inversion:
https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111

ControlNet Tile’s updates yesterday (looks like they are going to use this SR-like model to compete MidjourneyV5/5.1 in image details):
https://github.com/lllyasviel/ControlNet-v1-1-nightly#ControlNet-11-Tile

Loopback Scaler:
https://civitai.com/models/23188/loopback-scaler

DeepFloyd’s 256 stage model (IF-III-L):
https://github.com/deep-floyd/IF

Some of these methods are likely to use prompts, yet it seems that getting a prompt from small image is trivial for BLIP, and all ControlNets have a ‘guessmode’ that can use empty string as prompts. Loopback Scaler and Tiled diffusion seem to suggest people always using same string as prompts whatever the image is so they actually do not require prompts.

Most of these methods can be easily used by installing a latest version of automatic1111.

pkuliyi2015 · 2023-05-12T21:38:32Z

Yes, I also want a visual comparison.

If your method is competitive (For example if you can upscale to 4k images like the controlnet tile model), I will be happy to migrate your method to the automatic1111.

By the way I'm also studying in NTU. We may have opportunity to cooperate!

IceClear · 2023-05-13T06:26:02Z

Hi, thanks for your interests of our work!
We currently do not compare StableSR with these open-sourced demos in our paper due to the following reasons:
(1) These open-sourced demos are not academic papers formally accepted by conferences or journals after official reviews.
(2) Our current released code and paper were finished around March, though just publicly available. And we did not notice these demos then.

We appreciate your valuable advice and we will go through these demos later.
We will provide visual comparisons soon :)
BTW, we would revise the title of the issue for easy understanding.

Next, we will compare with these baselines one by one.

IceClear · 2023-05-13T14:16:48Z

Comparison with Tiled DDPM:
We first test on the image from the commonly used real-world test set here. For Tiled DDPM, we use the same pretrained diffusion model as StableSR (v2-1_512-ema-pruned.ckpt) and follow most of the settings provided by Tiled DDPM. We use large sampling steps for better performance, the prompts are the same as Tiled DDPM:

Result of Tiled DDPM:

Result of StableSR:

We observe that Tiled DDPM tends to be struggling with fidelity as well as the quality in real-world cases.

IceClear · 2023-05-13T14:21:51Z

We further show an example on AIGC SR, though StableSR is not for AIGC and never see such type of data during training. We directly test on the image provided by Tiled DDPM, the generated image is in 4K resolution:
StableSR result
Comparison with Zoomed LR
StableSR shows better fidelity compared with the result of Tiled DDPM.

pkuliyi2015 · 2023-05-13T16:36:21Z

Thanks for your effort in testing.

It seems that your model is compatible with my tiled diffusion method (that is only tiling, no advanced algorithm involved). Would you mind me migrating your model to the Automatic1111?

Or if you want to start the project on your own, I may be able to help.

IceClear · 2023-05-13T17:07:17Z

Thanks for your effort in testing.

It seems that your model is compatible with my tiled diffusion method (that is only tiling, no advanced algorithm involved). Would you mind me migrating your model to the Automatic1111?

Or if you want to start the project on your own, I may be able to help.

Hi~ Thanks for your interest.
I am OK with that. Automatic1111 is a popular repo and we are glad to see that our research can contribute to practical use.
Just remember to include our license : )

Honestly, the main purpose of this paper is just to attempt to make contributions to the research community, even if the contributions may be tiny.
We do not mean to list and 'K.O.' all the other baselines in the world.
StableSR is good but not perfect, and we appreciate suggestions and efforts that can make StableSR better.

wo262 · 2023-05-15T08:59:07Z

StableSR is so far the best identity preserving scaling method out there. Meaning if you downscale it back to its original res, each pixel should average back to it's original value and it shouldn't make up features larger than the pixels. While the new details should look plausible and not like a mere filter.

Comparison between StableSr minus base image, and TiledDDPM minus base image using the highres image provided in @pkuliyi2015 's github page

IceClear · 2023-05-15T09:36:15Z

For the comparison with ControlNet Tile. It seems it is still in updating and not fully included in A1111. The gradio demo they provided currently does not support upscaling in tiles. And unfortunately, I am not familiar with gradio and failed to build it in A1111 after trying for two days. So I just skip this comparison.
However, from the results they showed in readme, I conjecture the fidelity of the results may not be very good and whether ControlNet Tile can be directly applied for real-world images with unknown degradation is also a question.
BTW, our StableSR has been fully released and anyone interested in it is welcomed to conduct the comparison : )

IceClear · 2023-05-15T11:53:27Z

Comparison with Loopback Scaler:
I ran Loopback Scaler on A1111 and it reports "NAN error" on the tiger image using above and I did not figure out the reason.
However, I managed to run it on another example from the internet:

I use the same prompt as used in Tiled DDPM.
I use the same pretrained diffusion model as StableSR (v2-1_512-ema-pruned.ckpt) and other settings are shown below:

Result of Loopback Scaler:

Result of StableSR:

Similarly, we observe that Loopback Scaler has inferior performance in this real-world case.

IceClear · 2023-05-15T18:18:09Z

Comparison with DeepFloyd:
I use the stage 3 model for 4x upsampling.
I use the same prompts as in the above test and the noise level is set to 100 as default.

Result of DeepFloyd:

Obviously, it is still mainly a fidelity issue, while the quality of some detailed textures are also not as good as StableSR.

IceClear · 2023-05-15T18:33:52Z

Conclusion: As observed in the comparisons above, our StableSR significantly differs from the above diffusion-based upscalers with higher fidelity, which is also the main challenge of applying diffusion prior for SR as discussed in our paper.
We think the comparisons are not mainly about which method is the best, they just indicate that we focus on different applications.

Specifically, the above upscalers still focus on 'creation', and they mainly handle AIGC images whose degradation is different from real-world images captured by cameras. Hence, they mainly care about generation quality, which means generating new content in the upscaled results is allowed.
However, for real-world image SR, fidelity is very important and existing methods such as RealESRGAN+ and LDM are actually common methods that people often use. Our StableSR mainly focuses on this direction and we attempt to keep the high fidelity using several strategies introduced in our paper.

We believe this is not the end, but the beginning to explore the powerful ability of diffusion models for image restoration.

IceClear changed the title ~~Comparison to commonly used method?~~ Additional comparisons to Tiled DDPM, ControlNet Tile, Loopback Scaler and DeepFloyed. May 13, 2023

IceClear pinned this issue May 13, 2023

IceClear added the good first issue Good for newcomers label May 15, 2023

IceClear closed this as completed May 31, 2023

IceClear unpinned this issue Mar 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional comparisons to Tiled DDPM, ControlNet Tile, Loopback Scaler and DeepFloyed. #2

Additional comparisons to Tiled DDPM, ControlNet Tile, Loopback Scaler and DeepFloyed. #2

UIUC-Marisa3 commented May 12, 2023

pkuliyi2015 commented May 12, 2023

IceClear commented May 13, 2023 •

edited

Loading

IceClear commented May 13, 2023 •

edited

Loading

IceClear commented May 13, 2023 •

edited

Loading

pkuliyi2015 commented May 13, 2023 •

edited

Loading

IceClear commented May 13, 2023 •

edited

Loading

wo262 commented May 15, 2023 •

edited

Loading

IceClear commented May 15, 2023 •

edited

Loading

IceClear commented May 15, 2023

IceClear commented May 15, 2023 •

edited

Loading

IceClear commented May 15, 2023

Additional comparisons to Tiled DDPM, ControlNet Tile, Loopback Scaler and DeepFloyed. #2

Additional comparisons to Tiled DDPM, ControlNet Tile, Loopback Scaler and DeepFloyed. #2

Comments

UIUC-Marisa3 commented May 12, 2023

pkuliyi2015 commented May 12, 2023

IceClear commented May 13, 2023 • edited Loading

IceClear commented May 13, 2023 • edited Loading

IceClear commented May 13, 2023 • edited Loading

pkuliyi2015 commented May 13, 2023 • edited Loading

IceClear commented May 13, 2023 • edited Loading

wo262 commented May 15, 2023 • edited Loading

IceClear commented May 15, 2023 • edited Loading

IceClear commented May 15, 2023

IceClear commented May 15, 2023 • edited Loading

IceClear commented May 15, 2023

IceClear commented May 13, 2023 •

edited

Loading

IceClear commented May 13, 2023 •

edited

Loading

IceClear commented May 13, 2023 •

edited

Loading

pkuliyi2015 commented May 13, 2023 •

edited

Loading

IceClear commented May 13, 2023 •

edited

Loading

wo262 commented May 15, 2023 •

edited

Loading

IceClear commented May 15, 2023 •

edited

Loading

IceClear commented May 15, 2023 •

edited

Loading