We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi @ahmadki I'm trying to reproduce the Stable diffusion training results.
I noticed when I decode the moments back to images using the VAE's decoder, I'm getting whited out images. See:
I noticed in the images2latents.py, the images are not normalized:
https://github.com/mlcommons/training/blob/master/stable_diffusion/webdataset_images2latents.py#L86
If I add normalization as follows:
transforms = transforms.Compose( [ transforms.ToTensor(), transforms.Resize(size=512, interpolation=transforms.InterpolationMode.BICUBIC), transforms.CenterCrop(size=512), transforms.Normalize([0.5], [0.5]) ] )
The whiteout goes away:
Is there a reason why images were not normalized and how does this affect training of the unet?
Code to reproduce using HuggingFace diffusers:
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler import torch from diffusers.models.autoencoders.vae import DiagonalGaussianDistribution import numpy as np from PIL import Image model_id = "stabilityai/stable-diffusion-2-base" # Use the Euler scheduler here instead scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler") pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16) pipe = pipe.to("cuda") moments = torch.from_numpy(np.load("000009999.npy")).type(torch.float16) moments = moments.to("cuda") latents = DiagonalGaussianDistribution(moments).sample() latents = latents * pipe.vae.config.scaling_factor latents = 1 / pipe.vae.config.scaling_factor * latents image = pipe.vae.decode(latents, return_dict=False)[0] image = (image / 2 + 0.5).clamp(0, 1) image = image.cpu().permute(0, 2, 3 ,1).detach().float().numpy() image = (image * 255).round().astype("uint8") image = Image.fromarray(image[0]) image.save("test.png")
The text was updated successfully, but these errors were encountered:
@ahmadki can you take a look this?
Sorry, something went wrong.
wg has decided since SD will be refreshed or updated in 5.0 this fix might not be worth the effort.
No branches or pull requests
Hi @ahmadki I'm trying to reproduce the Stable diffusion training results.
I noticed when I decode the moments back to images using the VAE's decoder, I'm getting whited out images. See:
I noticed in the images2latents.py, the images are not normalized:
https://github.com/mlcommons/training/blob/master/stable_diffusion/webdataset_images2latents.py#L86
If I add normalization as follows:
The whiteout goes away:
Is there a reason why images were not normalized and how does this affect training of the unet?
Code to reproduce using HuggingFace diffusers:
The text was updated successfully, but these errors were encountered: