-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding additional input channels to model after intialization / Converting text2img to mask inpainting to resume training #1619
Comments
Hey @BenjaminIrwin, This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code: from diffusers import UNet2DConditionModel
model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True) It will initialize the models with the pretrained weights except for the input conv weight which now is of size:
and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint. Make sure to pass both |
Thanks very much. This is great. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
For future readers: The code snippet above can be used to transform a text2image unet to an inpainting unet as asked here: #2280 |
@patrickvonplaten this is exactly the issue i was looking for! So I forked a popular hugginface space to create a custom dreambooth model trainging it against a couple of new concepts: https://huggingface.co/spaces/multimodalart/dreambooth-training It's great! I've used it a few times and generated a few v1.5 based custom models! I thought I could use a custrom trained model based on SDv1.5 and it would work with the inpainting pipeline out of the box...oh how wrong i was :) I've tried to change my fork to add SDv1.5-inpainting as the base model but no luck debugging the workspace. And then I saw this issue which if I'm not mistkane...should allow me to use my regular SDv1.5 model and use it for inpainting pipeline? Am I mistaken here? I used your suggestion model_path = "mycustommodel"
unet = UNet2DConditionModel.from_pretrained(model_path, torch_dtype=torch.float16, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)
pipe = StableDiffusionInpaintPipeline.from_pretrained(
model_path, scheduler=scheduler, torch_dtype=torch.float16, unet=unet, safety_checker=None,
)
# For reference what i'm doing to setup inference...some m1 macbook specific stuff here
pipe = pipe.to("mps")
g_cuda = None
# @markdown Can set random seed here for reproducibility.
gen = torch.Generator(device="cpu")
seed = 52362 # @param {type:"number"}
gen.manual_seed(seed)
negative_prompt = ""
num_samples = 1
guidance_scale = 7.5
num_inference_steps = 25
height = 512
width = 512
images = pipe(
prompt=prompt,
image=init_image,
mask_image=mask_image,
generator=gen,
height=height,
width=width,
negative_prompt=negative_prompt,
num_images_per_prompt=num_samples,
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale,
).images So i do that and finally i'm not getting the same @patrickvonplaten Am i completely on the wrong path here? Is my only real option to train a new custom model with SDv1.5-inpaiting as the base model? Thanks in advance! |
You need to fine-tune the text-to-image to learn how to do in-painting. The architecture is slightly different |
@patrickvonplaten not sure how to do that...interestingly enough i ended up using the popular sd gui https://github.com/AUTOMATIC1111/stable-diffusion-webui It has a nice UI to merge models. So I merged v1-5 inpainting, my custom model and v1-5-pruned. Found out about it here https://www.reddit.com/r/StableDiffusion/comments/zyi24j/how_to_turn_any_model_into_an_inpainting_model/ That worked for me. @patrickvonplaten is your code approach essentially doing the same? Still learning a lot about diffusion models so apologies. And of course thank you for all of your work! |
@hamin @patrickvonplaten I've been trying to do the same stuff as @hamin. Still, after looking at these issues, the approach for converting any model to an inpainting model through the python scripts is always a better thing to do so, @patrickvonplaten, as you mentioned "You need to fine-tune the text-to-image to learn how to do in-painting. The architecture is slightly different", can you say how to do so or any script is available in internet to do so. I hope you'll revert to me soon, |
Opened a PR to improve error handling for the above case btw: #2847 |
This is great. Maybe this snippet could have a place in the documentation somewhere /cc @yiyixuxu what do you think? |
@patrickvonplaten Was looking for this answer in the documentation. Would be great to have this more prominent in the main doc. |
RuntimeError: Given groups=1, weight of size [320, 9, 3, 3], expected input[4, 4, 64, 64] to have 9 channels, but got 4 channels instead |
As you said, they are randomly initialized, meaning the learned kernels in the conv_in layer are gone. Should it not be possible to take the kernels of the pre-trained model and zero-initialize the other ones? Like this e.g.
Thanks |
Hi All, I am getting following error - Can someone help please? File "C:\SDComfyUI\ComfyUI_windows_portable\src\diffusers\src\diffusers\models\modeling_utils.py", line 154, in load_model_dict_into_meta |
I got the same error |
Sorry, but how do I pass them on?
Where do I have to run this code and how to pass on tows two parameters? |
PipelineLoader ComfyUI Error ReportError Details
Stack Trace
|
did you know where to run this code? |
Hi, what do you mean where? This is the code posted: from diffusers import UNet2DConditionModel
model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True) Maybe people are misunderstanding here, this is to create an inpainting model from a base model but this will require training so the model learns how to inpaint, is not to convert base models to inpainting without training. So if you really don't know what the code from before means, probably training an inpainting model from scratch goes beyond of what you can do right now. There's an easy way to transform models to inpainting which is to just merge the difference between the base model and the existing inpainting model with the model you want. You can search for that solution and there's a couple of scripts and GUIs that does this. Also for the ComfyUI problem, you should report that to the correct repository, we really don't know what ComfyUI or the custom node are doing, so if it isn't something you or we can't reproduce with a clean diffusers snippet of code, we can't really help. |
Have scoured the docs for an answer to this, to no avail. Is it possible to add additional input channels to a model after initializing it using
.from_pretrained
.For example (taken from your Dreambooth example):
In the code above, if I now wanted to introduce additional input channels to
unet
and zero-initialize the weights, would this be possible? If so, how would I do this?Thank you in advance.
The text was updated successfully, but these errors were encountered: