`StableDiffusionXLInstructPix2PixPipeline` doesn't work with cosxl_edit #7621

apolinario · 2024-04-09T18:17:20Z

Describe the bug

CosXL Edit is an InstructPix2Pix model (https://huggingface.co/stabilityai/cosxl) released together with CosXL, however trying to load it gives a size mismatch error

Reproduction

import torch
from diffusers import StableDiffusionXLInstructPix2PixPipeline

pipe = StableDiffusionXLInstructPix2PixPipeline.from_single_file(
    "cosxl_edit.safetensors"
)

Logs

tokenizer_config.json: 100%
 905/905 [00:00<00:00, 13.7kB/s]
vocab.json: 100%
 961k/961k [00:00<00:00, 10.2MB/s]
merges.txt: 100%
 525k/525k [00:00<00:00, 17.4MB/s]
special_tokens_map.json: 100%
 389/389 [00:00<00:00, 20.1kB/s]
tokenizer.json: 100%
 2.22M/2.22M [00:00<00:00, 16.0MB/s]
config.json: 100%
 4.52k/4.52k [00:00<00:00, 250kB/s]
tokenizer_config.json: 100%
 904/904 [00:00<00:00, 50.1kB/s]
vocab.json: 100%
 862k/862k [00:00<00:00, 34.1MB/s]
merges.txt: 100%
 525k/525k [00:00<00:00, 22.2MB/s]
special_tokens_map.json: 100%
 389/389 [00:00<00:00, 21.6kB/s]
tokenizer.json: 100%
 2.22M/2.22M [00:00<00:00, 16.5MB/s]
config.json: 100%
 4.88k/4.88k [00:00<00:00, 253kB/s]
Some weights of the model checkpoint were not used when initializing CLIPTextModelWithProjection: 
 ['text_model.embeddings.position_ids']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-01c040bbaf7e> in <cell line: 5>()
      3 from diffusers.utils import load_image
      4 
----> 5 pipe = StableDiffusionXLInstructPix2PixPipeline.from_single_file(
      6     file, torch_dtype=torch.float16
      7 )

4 frames
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py in _inner_fn(*args, **kwargs)
    116             kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs)
    117 
--> 118         return fn(*args, **kwargs)
    119 
    120     return _inner_fn  # type: ignore

/usr/local/lib/python3.10/dist-packages/diffusers/loaders/single_file.py in from_single_file(cls, pretrained_model_link_or_path, **kwargs)
    287                 init_kwargs[name] = passed_class_obj[name]
    288             else:
--> 289                 components = build_sub_model_components(
    290                     init_kwargs,
    291                     class_name,

/usr/local/lib/python3.10/dist-packages/diffusers/loaders/single_file.py in build_sub_model_components(pipeline_components, pipeline_class_name, component_name, original_config, checkpoint, local_files_only, load_safety_checker, model_type, image_size, torch_dtype, **kwargs)
     59         upcast_attention = kwargs.pop("upcast_attention", None)
     60 
---> 61         unet_components = create_diffusers_unet_model_from_ldm(
     62             pipeline_class_name,
     63             original_config,

/usr/local/lib/python3.10/dist-packages/diffusers/loaders/single_file_utils.py in create_diffusers_unet_model_from_ldm(pipeline_class_name, original_config, checkpoint, num_in_channels, upcast_attention, extract_ema, image_size, torch_dtype, model_type)
   1320         from ..models.modeling_utils import load_model_dict_into_meta
   1321 
-> 1322         unexpected_keys = load_model_dict_into_meta(unet, diffusers_format_unet_checkpoint, dtype=torch_dtype)
   1323         if unet._keys_to_ignore_on_load_unexpected is not None:
   1324             for pat in unet._keys_to_ignore_on_load_unexpected:

/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py in load_model_dict_into_meta(model, state_dict, device, dtype, model_name_or_path)
    150         if empty_state_dict[param_name].shape != param.shape:
    151             model_name_or_path_str = f"{model_name_or_path} " if model_name_or_path is not None else ""
--> 152             raise ValueError(
    153                 f"Cannot load {model_name_or_path_str}because {param_name} expected shape {empty_state_dict[param_name]}, but got {param.shape}. If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example."
    154             )

ValueError: Cannot load because conv_in.weight expected shape tensor(..., device='meta', size=(320, 4, 3, 3)), but got torch.Size([320, 8, 3, 3]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.



### System Info

diffusers==0.27.2

### Who can help?

@sayakpaul , @yiyixuxu

The text was updated successfully, but these errors were encountered:

yiyixuxu · 2024-04-09T18:52:24Z

should be able to get the checkpoint in

import torch
from diffusers import StableDiffusionXLInstructPix2PixPipeline

pipe = StableDiffusionXLInstructPix2PixPipeline.from_single_file(
    "https://huggingface.co/stabilityai/cosxl/blob/main/cosxl.safetensors", num_in_channels=8,
)

yiyixuxu · 2024-04-09T18:59:00Z

cc @DN6 here
let's make sure to support SDXL InstructPix2Pix out of box in #7496

we should support every model listed in here https://github.com/comfyanonymous/ComfyUI/blob/4201181b35402e0a992b861f8d2f0e0b267f52fa/comfy/supported_models.py#L479

apolinario · 2024-04-09T22:44:29Z

This worked with num_in_channels=8 (as in: didn't error). However perceptually isn't behaving as it should

Edit image:

Edit prompt Turn sky into a cloudy one:

import torch
from diffusers import StableDiffusionXLInstructPix2PixPipeline, EDMEulerScheduler

inst_file = "cosxl_edit.safetensors"

pipe = StableDiffusionXLInstructPix2PixPipeline.from_single_file(
    inst_file, num_in_channels=8,
).to("cuda")

pipe.scheduler = EDMEulerScheduler(sigma_min=0.002, sigma_max=120.0, sigma_data=1.0, prediction_type="v_prediction")

resolution = 1024
image = load_image(
    "https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png"
).resize((resolution, resolution))

edit_instruction = "Turn sky into a cloudy one"
edited_image = pipe(
    prompt=edit_instruction,
    image=image,
    height=resolution,
    width=resolution,
    #guidance_scale=3.0,
    #image_guidance_scale=1.5,
    num_inference_steps=20,
).images[0]

sayakpaul · 2024-04-10T06:13:15Z

Not sure if it's the exact guidance formulation that we have in the InstructPix2Pix pipeline though. That would matter a lot.

If it's possible, could you try to initialize the StableDiffusionXLInstructPix2PixPipeline with each components initialized separately?

unet = ...
text_encoder = ...
text_encoder_2 = ...
vae = ...
scheduler = ...

pipeline = ...

apolinario · 2024-04-10T06:48:35Z

Not sure if it's the exact guidance formulation that we have in the InstructPix2Pix pipeline though. That would matter a lot.

ComfyUI uses the same InstructPix2PixConditioning node for it that they use for InstructPix2Pix itself. Overall this is how Comfy supported the CosXL models. Once that was in, the nodes for supporting it seem similar to InstructPix2Pix vanilla.
comfyanonymous/ComfyUI@1088d18

This are the nodes for the comfyui official edit workflow

AIf it's possible, could you try to initialize the StableDiffusionXLInstructPix2PixPipeline with each components initialized separately?

As I'm using from_single_file, I think the methods UNet2DConditionModel etc don't have it afaik. How do you think that would help with debugging/making it work?

yiyixuxu · 2024-04-10T09:19:34Z

@apolinario

just have to scale the image_latents

adding this to the pipeline

        # 6. Prepare Image latents
        image_latents = self.prepare_image_latents(
            image,
            batch_size,
            num_images_per_prompt,
            prompt_embeds.dtype,
            device,
            do_classifier_free_guidance,
        )
        image_latents = latents * self.vae.config.scaling_factor

sayakpaul · 2024-04-10T09:23:55Z

Nice finding. However, the SD Pix2Pix doesn't have it :o

apolinario · 2024-04-10T12:18:38Z

Awesome! What's the best way to proceed here? Modify the pipeline to detect if scaling is needed or not or create a new one?

sayakpaul · 2024-04-10T12:23:17Z

I think the following could work:

after introducing the sigma scheduling changes to the EDM schedulers (as discussed internally with Suraj), we serialise the pipeline in the diffusers format. This gives us the scheduler with all the right configurations.
in the pipelining code, we check if the scheduler has the EDM type and if so, we scale the latents.

WDYT? @yiyixuxu would love your thoughts too.

yiyixuxu · 2024-04-11T19:39:58Z

I think we should modify the pipeline to detect if scaling is needed

based on my understanding, how we scale latent is not dependent on the scheduler type but more specific to how this model is trained, i.e. in most of our pipelines, the image_latents are scaled regardless of which scheduler you use

diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py

Line 946 in 8e14535

image_latents = self.vae.config.scaling_factor * image_latents

so I think we should add a pipeline config e.g. something like is_cosxl, that the user can pass to from_single_file()

cc @DN6 here

sayakpaul · 2024-04-12T03:35:33Z

so I think we should add a pipeline config e.g. something like is_cosxl, that the user can pass to from_single_file(), with this flag, we can map it to the correct scheduler config too in from_single_file

If we introduce that only for from_single_file(), won't that introduce a discrepancy between from_pretrained() and from_single_file() methods of InstructPix2Pix then? I thought we were trying to reduce these kinds of discrepancies with Dhruv's refactor.

DN6 · 2024-04-12T04:28:49Z

If the argument is added to the pipeline and is only a pipeline argument then that wouldn't be a discrepancy. What we want is to avoid configuring models via pipeline invocations

sayakpaul · 2024-04-12T04:29:56Z

What we want is to avoid configuring models via pipeline invocations

Like this?

pipe = StableDiffusionXLInstructPix2PixPipeline.from_single_file(
    "https://huggingface.co/stabilityai/cosxl/blob/main/cosxl.safetensors", num_in_channels=8,
)

github-actions · 2024-05-10T15:02:41Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

apolinario added the bug Something isn't working label Apr 9, 2024

sayakpaul mentioned this issue Apr 12, 2024

[Core] is_cosxl_edit arg in SDXL ip2p. #7650

Merged

github-actions bot added the stale Issues that haven't received updates label May 10, 2024

yiyixuxu closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`StableDiffusionXLInstructPix2PixPipeline` doesn't work with cosxl_edit #7621

`StableDiffusionXLInstructPix2PixPipeline` doesn't work with cosxl_edit #7621

apolinario commented Apr 9, 2024

yiyixuxu commented Apr 9, 2024 •

edited

Loading

yiyixuxu commented Apr 9, 2024

apolinario commented Apr 9, 2024 •

edited

Loading

sayakpaul commented Apr 10, 2024 •

edited

Loading

apolinario commented Apr 10, 2024

yiyixuxu commented Apr 10, 2024 •

edited

Loading

sayakpaul commented Apr 10, 2024

apolinario commented Apr 10, 2024

sayakpaul commented Apr 10, 2024 •

edited

Loading

yiyixuxu commented Apr 11, 2024 •

edited

Loading

sayakpaul commented Apr 12, 2024

DN6 commented Apr 12, 2024

sayakpaul commented Apr 12, 2024

github-actions bot commented May 10, 2024

StableDiffusionXLInstructPix2PixPipeline doesn't work with cosxl_edit #7621

StableDiffusionXLInstructPix2PixPipeline doesn't work with cosxl_edit #7621

Comments

apolinario commented Apr 9, 2024

Describe the bug

Reproduction

Logs

yiyixuxu commented Apr 9, 2024 • edited Loading

yiyixuxu commented Apr 9, 2024

apolinario commented Apr 9, 2024 • edited Loading

sayakpaul commented Apr 10, 2024 • edited Loading

apolinario commented Apr 10, 2024

yiyixuxu commented Apr 10, 2024 • edited Loading

sayakpaul commented Apr 10, 2024

apolinario commented Apr 10, 2024

sayakpaul commented Apr 10, 2024 • edited Loading

yiyixuxu commented Apr 11, 2024 • edited Loading

sayakpaul commented Apr 12, 2024

DN6 commented Apr 12, 2024

sayakpaul commented Apr 12, 2024

github-actions bot commented May 10, 2024

`StableDiffusionXLInstructPix2PixPipeline` doesn't work with cosxl_edit #7621

`StableDiffusionXLInstructPix2PixPipeline` doesn't work with cosxl_edit #7621

yiyixuxu commented Apr 9, 2024 •

edited

Loading

apolinario commented Apr 9, 2024 •

edited

Loading

sayakpaul commented Apr 10, 2024 •

edited

Loading

yiyixuxu commented Apr 10, 2024 •

edited

Loading

sayakpaul commented Apr 10, 2024 •

edited

Loading

yiyixuxu commented Apr 11, 2024 •

edited

Loading