Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Using Multiple Controls (Depth and Canny) with LoRA on FLUX.1-dev Model #10594

Open
pramishp opened this issue Jan 16, 2025 · 9 comments
Labels
bug Something isn't working

Comments

@pramishp
Copy link

pramishp commented Jan 16, 2025

Describe the bug

When attempting to use multiple control images (Depth and Canny) with LoRA on the FLUX.1-dev model, an error occurs during execution. The documentation indicates that multiple control images in PIL format can be supplied, but the pipeline throws a runtime error. Notably, the pipeline functions correctly with a single control image.

Expected Behavior

The pipeline should generate the output image without errors when multiple control images (Depth and Canny) are supplied.

Observed Behavior

The pipeline fails with the error RuntimeError: shape '[1, 16, 64, 2, 64, 2]' is invalid for input of size 524288.

Reproduction

1.	Set up the FLUX.1-dev model with multiple control images using LoRA.
2.	Use a Depth control image and a Canny control image.
3.	Execute the code with the following snippet:
import os
from huggingface_hub import login
from diffusers import FluxControlPipeline
from image_gen_aux import DepthPreprocessor
from diffusers.utils import load_image
from controlnet_aux import CannyDetector
import numpy as np
import torch

# Set Hugging Face directories
os.environ["HF_HOME"] = "/scratch/pramish_paudel/job_108669/hf"
os.environ["HF_DATASETS_CACHE"] = "/scratch/pramish_paudel/job_1086695/hf"

login(token="<REDACTED>")

control_pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
control_pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora", adapter_name="depth")
control_pipe.load_lora_weights("black-forest-labs/FLUX.1-Canny-dev-lora", adapter_name="canny")

control_pipe.set_adapters(["depth", "canny"], adapter_weights=[0.85, 0.85])
control_pipe.enable_model_cpu_offload()

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image1 = processor(control_image)[0].convert("RGB")
shape = np.asarray(control_image1).shape[0]

processor = CannyDetector()
control_image2 = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=shape, image_resolution=shape)

image = control_pipe(
    prompt=prompt,
    control_image=[control_image1, control_image2],
    height=1024,
    width=1024,
    num_inference_steps=30,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output.png")

### Logs

```shell
/lib/python3.12/site-packages/diffusers/pipelines/flux/pipeline_flux_control.py", line 474, in _pack_latents
    latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 16, 64, 2, 64, 2]' is invalid for input of size 524288

System Info

•	diffusers version:  0.32.0
•	Python version: 3.12
•	System: Debian GNU/Linux
•	GPU: A6000

Who can help?

@sayakpaul @yiyixuxu @DN6

@pramishp pramishp added the bug Something isn't working label Jan 16, 2025
@sayakpaul
Copy link
Member

Can you perform inference sucessfully when using a single LoRA and multiple control images?

@a-r-r-o-w
Copy link
Member

@pramishp You pasted code with your exposed HF_TOKEN. I've edited the example to remove the token and removed the edit from the revision history. Please revoke/rotate the token on your end

@pramishp
Copy link
Author

@a-r-r-o-w , my bad. Thanks !

@pramishp
Copy link
Author

@sayakpaul , I get the same error on using single LoRA and multiple control images.

@sayakpaul
Copy link
Member

And what happens when we use the full Control model such as this?

# !pip install -U controlnet-aux
import torch
from controlnet_aux import CannyDetector
from diffusers import FluxControlPipeline
from diffusers.utils import load_image

pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", torch_dtype=torch.bfloat16).to("cuda")

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = CannyDetector()
control_image = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024)

image = pipe(
    prompt=prompt,
    control_image=control_image,
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=30.0,
).images[0]
image.save("output.png")

So, multiple control image is broken, IIUC. Cc: @yiyixuxu

@pramishp
Copy link
Author

Full control models like Canny and Depth works fine. I have played with Depth version and it works perfectly. The problem is using multiple controls. There's even one example in docs though it's for different controls.

https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#combining-flux-turbo-loras-with-flux-control-fill-and-redux

@sayakpaul
Copy link
Member

So, you mean to say the following works:

image = control_pipe(
    prompt=prompt,
    control_image=[control_image1, control_image2],
    height=1024,
    width=1024,
    num_inference_steps=30,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]

where control_pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", torch_dtype=torch.bfloat16).to("cuda")

?

@pramishp
Copy link
Author

@sayakpaul , No ! I haven't tried this. But, this likely won't work. In my above comment, I was referring to the case with single control input for control models like full Depth and Canny.

@yiyixuxu
Copy link
Collaborator

hi @pramishp

  1. I think flux control pipeline does not work in the "multi-control" manner, i.e. if you load 2 loras and pass 2 control image and expect the first lora use the first control image and second lora use the second control image.... I don't think it'll works that way based on my understanding
  2. all pipeline do take multiple images (this one included) but the number of images have to have same length as prompt; (unless it is multi-controlnet, which this is not)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants