Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative refiner implementation #12377

Closed
wants to merge 4 commits into from
Closed

Alternative refiner implementation #12377

wants to merge 4 commits into from

Conversation

AUTOMATIC1111
Copy link
Owner

@AUTOMATIC1111 AUTOMATIC1111 commented Aug 6, 2023

Description

  • two settings on Stable Diffusion page: Refiner checkpoint and Refiner switch at.
  • first lets you select a model.
  • second lets you select a ratio.
  • Runs two rounds of sampling: one for Refiner switch at * total steps steps, then switches model to Refiner checkpoint, and then finishes sampling in img2img mode, using remaining number of steps and denoising strength equal to 1 - Refiner switch at.
    • for example, with 20 steps and Refiner switch at = 0.25, the first sampling will go for 5 steps and the second - for 15.
  • switch back to original checkpoint happens after you start generating the next picture (subject to change?)
  • tested with SD1.
  • tested with SDXL.
  • possible to cross-refine SD1 and SD2.
  • works with kdiffusion samplers.
  • does not work with DDIM (and compvis samplers still don't work with SDXL).
  • works with img2img
  • does not work with hires fix (not sure whether the support should be added)
  • not tested with medvram/lowvram
  • infotext support
  • in future i plan to integrate this in a nice way into main UI, for now you just gotta put it into quicksettings bar
  • other PR: initial refiner support #12371

00420-1

@catboxanon catboxanon added the sdxl Related to SDXL label Aug 7, 2023
@zz2222222222222
Copy link

zz2222222222222 commented Aug 7, 2023

SD Unet not working after switch the model ,
the first SDXL can working with SD Unet, after switch the model the refiner not working with SD Unet .

@AUTOMATIC1111
Copy link
Owner Author

would you like to share specifics

@zz2222222222222
Copy link

zz2222222222222 commented Aug 7, 2023

would you like to share specifics

[08/07/2023-12:53:14] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.7.0
33%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 8/24 [00:01<00:02, 6.25it/s]
Reusing loaded model gf4.safetensors [57fdfb1fbe] to load my_mix46.safetensors [70aabbd23b]███████▎ | 7/24 [00:01<00:02, 8.06it/s]
Dectivating unet: [TRT] gf4
Loading weights [70aabbd23b] from ./stable-diffusion-webui/models/Stable-diffusion/my_mix46.safetensors
Creating model from config: .d/stable-diffusion-webui/configs/v1-inference.yaml
Applying attention optimization: sdp... done.
Model loaded in 0.6s (create model: 0.1s, apply weights to model: 0.3s, apply half(): 0.1s).
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:02<00:00, 7.19it/s]
Total progress: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:06<00:00, 3.54it/s]
Total progress: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:06<00:00, 8.06it/s]

from the log it not load the second TRT model
the first model is SDXL gf4 (safetensors and trt)
(i cant sure system use safetensors inference or trt ,if without refiner use trt speed 12.00it/s,but in here only 6.25it/s same with use without refiner use safetensors model speed)

the seconds model is SD my_mix46.safetensors
the second model i can sure beacuase it not load the my_mix46.trt model

@AUTOMATIC1111
Copy link
Owner Author

pushed a possible solution

@zz2222222222222
Copy link

zz2222222222222 commented Aug 7, 2023

pushed a possible solution

Thank your so much
use this apply unet overrides after switching model

can fix the first model not use SD Unet problem

the second still cant ,and im try modify the code in
/stable-diffusion-webui/modules/sd_models.py

695 load_model(checkpoint_info, already_loaded_state_dict=state_dict)
++696 sd_unet.apply_unet('Automatic') // it working for me
return model_data.sd_model

this code will force use of trt for inference, can fix my solution,but not a good idea for other user.


after log

Activating unet: [TRT] gf4
[08/07/2023-13:51:54] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.7.0
[08/07/2023-13:51:54] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.7.0
33%|████████████████████████████████████████████████████████████████████████████████████████████████▎ | 8/24 [00:00<00:01, 9.38it/s]
Reusing loaded model gf4.safetensors [57fdfb1fbe] to load my_mix46.safetensors [70aabbd23b] | 7/24 [00:00<00:01, 10.46it/s]
Dectivating unet: [TRT] gf4
Loading weights [70aabbd23b] from ../stable-diffusion-webui/models/Stable-diffusion/my_mix46.safetensors
Creating model from config: ../stable-diffusion-webui/configs/v1-inference.yaml
Applying attention optimization: xformers... done.
Model loaded in 1.1s (create model: 0.2s, apply weights to model: 0.4s, apply half(): 0.2s, move model to device: 0.2s).
Activating unet: [TRT] my_mix46
[08/07/2023-13:52:03] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.7.0
[08/07/2023-13:52:03] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.7.0
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:01<00:00, 14.40it/s]
Total progress: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:12<00:00, 1.91it/s]
Total progress: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:12<00:00, 5.94it/s]

@zz2222222222222
Copy link

zz2222222222222 commented Aug 7, 2023

when we test again ,found before all return method ,need use sd_unet.apply_unet() in def reload_model_weights(sd_model=None, info=None): this method

so im modify it like that
def reload_model_weights(sd_model=None,info=None):
model=reload_model_weights_k(sd_model, info)
sd_unet.apply_unet('Automatic')
return model

def reload_model_weights_k(sd_model=None, info=None):

@brkirch
Copy link
Collaborator

brkirch commented Aug 7, 2023

There are a few problems I've found, and unfortunately this approach will need some modifications to get the best quality possible.

First, comparing the results from this PR to what I was getting with #12328 (click the arrows on the left to show the images):

This PR

00000-1495947642

#12328

00000-1495947642

This PR produces an image with less fine detail. This is because this PR still adds noise whereas mine does not. That said, if the noise is zeroed, the image is still not quite right:

This PR without adding noise

00001-1495947642

Looking further I found an off by 1 error in the denoising strength calculation:

            self.denoising_strength = 1.0 - stopped_at / self.steps

stopped_at is one less than the actual step stopped at, so that should be:

            self.denoising_strength = 1.0 - (stopped_at + 1) / self.steps

Unfortunately even with that change this is still not at the quality my PR had:

This PR without adding noise and with correct denoising strength

00002-1495947642

So what is different from my PR? If I try this PR but prune the sigmas for the first pass by the number of steps early to stop:

This PR without adding noise, with correct denoising strength, and pruning sigmas for the first pass

00003-1495947642

The official SD XL repo also prunes sigmas, so that may be a requirement for this to work correctly. Unfortunately my testing for this approach was done also by pruning sigmas for the first pass to stop early and running the highres pass normally but without added noise.

@AUTOMATIC1111
Copy link
Owner Author

oh, you're right, you're right, adding noise is wrong, and not adding is wrong too, I need to recover the noisy image from the sampler rather than denoised one.

@AUTOMATIC1111
Copy link
Owner Author

Changed it to work with noisy latent from kdiffusion, and not add any noise. Looks fine for SDXL->SDXL and SD1->SD1, but pretty bad for SD1->SDXL. Maybe I'm doing something wrong, but if not we will have to revert to adding noise for SD1<->SDXL.

@catboxanon catboxanon linked an issue Aug 7, 2023 that may be closed by this pull request
1 task
@bosima
Copy link

bosima commented Aug 7, 2023

the first can working, the second execution to the refiner reports an error:
modules.devices.NansException: A tensor with all NaNs was produced in Unet.

Detail Log:

Reusing loaded model sd_xl_base_1.0.safetensors [31e35c80fc] to load sd_xl_refiner_1.0.safetensors [7440042bbd]
Loading weights [7440042bbd] from cache
Creating model from config: /root/stable-diffusion-webui/repositories/generative-models/configs/inference/sd_xl_refiner.yaml
Applying attention optimization: xformers... done.
Model loaded in 4.9s (create model: 0.2s, apply weights to model: 0.1s, move model to device: 2.8s, calculate empty prompt: 1.7s).
0%| | 0/8 [00:06<?, ?it/s]
*** Error completing request
*** Arguments: ('task(nhw0m2c5haxv0j4)', 'street fashion photography, young female, pale skin, (look at viewer), sexy pose,(pink hair, white hair, blonde hair, long hair), ((high ponytail)),detailed skin, (detailed eyes:1.3), skin pores, (grin:1.1), skin texture, (Hunter green uniform, black skirt:1.4), long green sleeves,8k, real picture, intricate details, ultra-detailed,(photorealistic),film action shot, full body shot, in a shopping mall,realistic, extremely high quality RAW photograph, detailed background, intricate, warm lighting, high resolution,uhd, film grain, Fujifilm XT3', 'text, watermark, disfigured, kitsch, ugly, oversaturated, low-res, blurred, painting, illustration, drawing, sketch, low quality, long exposure, (cape:1.4), cartoon, 3d character,', [], 30, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 1024, 1024, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 0, '', '', [], <gradio.routes.Request object at 0x7fd26938b850>, 0, True, False, False, False, 'base', <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x7fd269c63d60>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x7fd269ad2bc0>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x7fd269ad1570>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x7fd26938aaa0>, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, None, None, False, None, None, False, None, None, False, None, None, False, 50) {}
Traceback (most recent call last):
File "/root/stable-diffusion-webui/modules/call_queue.py", line 58, in f
res = list(func(*args, **kwargs))
File "/root/stable-diffusion-webui/modules/call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "/root/stable-diffusion-webui/modules/txt2img.py", line 63, in txt2img
processed = processing.process_images(p)
File "/root/stable-diffusion-webui/modules/processing.py", line 743, in process_images
res = process_images_inner(p)
File "/root/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/batch_hijack.py", line 42, in processing_process_images_hijack
return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
File "/root/stable-diffusion-webui/modules/processing.py", line 879, in process_images_inner
samples_ddim = p.run_refiner(samples_ddim)
File "/root/stable-diffusion-webui/modules/processing.py", line 425, in run_refiner
samples = self.sampler.sample_img2img(self, noisy_latent, x, self.c, self.uc, image_conditioning=self.image_conditioning, steps=max(1, self.steps - stopped_at - 1))
File "/root/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 471, in sample_img2img
samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File "/root/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 314, in launch_sampling
return func()
File "/root/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 471, in
samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File "/root/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/sampling.py", line 145, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "/root/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 219, in forward
devices.test_for_nans(x_out, "unet")
File "/root/stable-diffusion-webui/modules/devices.py", line 222, in test_for_nans
raise NansException(message)
modules.devices.NansException: A tensor with all NaNs was produced in Unet. Use --disable-nan-check commandline argument to disable this check.

@zz2222222222222
Copy link

zz2222222222222 commented Aug 8, 2023

@bosima AUTOMATIC1111/stable-diffusion-webui-tensorrt#58
as i know current dev dont support SDXL trt model
please follow this modify the code
im a new user dont know how to pull those code because it need modify many project

@lllyasviel
Copy link

lllyasviel commented Aug 8, 2023

I still believe we may have

if a_is_sdxl != b_is_sdxl:
just swap model without refresh sampler
else:
get clean x0 latent by vae decode+encode, compute cond, swap model, add noise to clean x0, refresh with img2img sampler

I know that we probably do not want two behaviors, but after latest considerations of this pull request, we already seem to get two behaviors.

But feel free to correct me if refreshing with img2img sampler is better than not for XL, but in my opinion, we should always preserve valuable history of sampler as long as we can

@lllyasviel
Copy link

lllyasviel commented Aug 8, 2023

I am experimenting these behaviors. https://github.com/lllyasviel/Stable-Diffusion-FixedUI
Please hold these two PRs one or two days and we may have more results.

@lllyasviel
Copy link

Update

OK my experiments finished:

  1. The logic in 12371 is correct. The logic in this PR (12377) is wrong. The results from 12371 are way better.

  2. Using early stop to refine XL with 1.5 is not a thing. The results are pretty bad whatever early stop we apply. Note that I also tried extracting clean latent and add noise (rather than decode+encode noisy latent). It works but results are bad.

  3. However, I recommend to have a fallback on the UI if user select 1.5 models to refine XL, otherwise the UI is not very friendly. I suggest to just fallback to highres fix behavior with same resolution and use float(1-switch_at) as denoising strength.

@AUTOMATIC1111
Copy link
Owner Author

Do you think it is a problem with implementation in this PR, or just with general concept? SD's official repo uses something like this.

@lllyasviel
Copy link

lllyasviel commented Aug 8, 2023

I think #12371 is better than re-initializing samplers.
I am not exactly sure, but it seems that 12371 is even better than SAI official or some node pipelines implementing two independent ksamplers in comfyui.
To validate this, we will need to test a enough number of images with different samplers.
But in my tests, it seems that 12371's native swap inside sampler seems always a bit better

@lllyasviel
Copy link

lllyasviel commented Aug 8, 2023

Besides, XL base-> early stop->decode vae->encode vae->add noise->sd 1.5->i2i->result seems to produce bad results.

XL base-> early stop->decode vae noisy latent->encode vae->sd 1.5->i2i->result is worse.

Results are better when XL base-> stop at last step->final result->decode vae->encode vae->add noise->go back to previous step->sd 1.5->i2i->result.

Also, XL vae decode + 1.5 vae encode tend to produce some slight ghost or color overflow problems, not sure why.

@lllyasviel
Copy link

Also, although not super recommended, it seems possible to add comfyui's git to repositories if we want to import some functions to test absolutely same results.

@AUTOMATIC1111
Copy link
Owner Author

that is not going to happen

@angrysky56
Copy link

If you don't let the base image be noisy the refiner doesn't do much it seems, but it can actually do a lot of work. If you set the refiner up to 50% and let it work on a fuzzy base image, say 17 to 33 steps, it is much faster and makes nice images. The settings aren't right out of the box in the template. I also will suggest a tiled VAE.

example images.
https://civitai.com/models/106747/sdxl-09-modded-workflows-pack-designed-to-run-on-16gb-ram-12gb-vram-1920x1920
image
image

This could help with your 1.5 / sdxl issues-
https://civitai.com/models/118811/sd15-with-sdxl-comfyui-workflow-template

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sdxl Related to SDXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: SDXL refiner support
7 participants