Issue with Using Multiple Controls (Depth and Canny) with LoRA on FLUX.1-dev Model #10594

pramishp · 2025-01-16T06:00:05Z

Describe the bug

When attempting to use multiple control images (Depth and Canny) with LoRA on the FLUX.1-dev model, an error occurs during execution. The documentation indicates that multiple control images in PIL format can be supplied, but the pipeline throws a runtime error. Notably, the pipeline functions correctly with a single control image.

Expected Behavior

The pipeline should generate the output image without errors when multiple control images (Depth and Canny) are supplied.

Observed Behavior

The pipeline fails with the error RuntimeError: shape '[1, 16, 64, 2, 64, 2]' is invalid for input of size 524288.

Reproduction

1.	Set up the FLUX.1-dev model with multiple control images using LoRA.
2.	Use a Depth control image and a Canny control image.
3.	Execute the code with the following snippet:

import os
from huggingface_hub import login
from diffusers import FluxControlPipeline
from image_gen_aux import DepthPreprocessor
from diffusers.utils import load_image
from controlnet_aux import CannyDetector
import numpy as np
import torch

# Set Hugging Face directories
os.environ["HF_HOME"] = "/scratch/pramish_paudel/job_108669/hf"
os.environ["HF_DATASETS_CACHE"] = "/scratch/pramish_paudel/job_1086695/hf"

login(token="<REDACTED>")

control_pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
control_pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora", adapter_name="depth")
control_pipe.load_lora_weights("black-forest-labs/FLUX.1-Canny-dev-lora", adapter_name="canny")

control_pipe.set_adapters(["depth", "canny"], adapter_weights=[0.85, 0.85])
control_pipe.enable_model_cpu_offload()

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image1 = processor(control_image)[0].convert("RGB")
shape = np.asarray(control_image1).shape[0]

processor = CannyDetector()
control_image2 = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=shape, image_resolution=shape)

image = control_pipe(
    prompt=prompt,
    control_image=[control_image1, control_image2],
    height=1024,
    width=1024,
    num_inference_steps=30,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output.png")

### Logs

```shell
/lib/python3.12/site-packages/diffusers/pipelines/flux/pipeline_flux_control.py", line 474, in _pack_latents
    latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 16, 64, 2, 64, 2]' is invalid for input of size 524288

System Info

•	diffusers version:  0.32.0
•	Python version: 3.12
•	System: Debian GNU/Linux
•	GPU: A6000

Who can help?

@sayakpaul @yiyixuxu @DN6

The text was updated successfully, but these errors were encountered:

sayakpaul · 2025-01-16T06:20:27Z

Can you perform inference sucessfully when using a single LoRA and multiple control images?

a-r-r-o-w · 2025-01-16T06:20:36Z

@pramishp You pasted code with your exposed HF_TOKEN. I've edited the example to remove the token and removed the edit from the revision history. Please revoke/rotate the token on your end

pramishp · 2025-01-16T09:56:22Z

@a-r-r-o-w , my bad. Thanks !

pramishp · 2025-01-16T10:33:02Z

@sayakpaul , I get the same error on using single LoRA and multiple control images.

sayakpaul · 2025-01-16T10:34:42Z

And what happens when we use the full Control model such as this?

# !pip install -U controlnet-aux
import torch
from controlnet_aux import CannyDetector
from diffusers import FluxControlPipeline
from diffusers.utils import load_image

pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", torch_dtype=torch.bfloat16).to("cuda")

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = CannyDetector()
control_image = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024)

image = pipe(
    prompt=prompt,
    control_image=control_image,
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=30.0,
).images[0]
image.save("output.png")

So, multiple control image is broken, IIUC. Cc: @yiyixuxu

pramishp · 2025-01-18T06:27:09Z

Full control models like Canny and Depth works fine. I have played with Depth version and it works perfectly. The problem is using multiple controls. There's even one example in docs though it's for different controls.

https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#combining-flux-turbo-loras-with-flux-control-fill-and-redux

sayakpaul · 2025-01-18T06:34:45Z

So, you mean to say the following works:

image = control_pipe(
    prompt=prompt,
    control_image=[control_image1, control_image2],
    height=1024,
    width=1024,
    num_inference_steps=30,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]

where control_pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", torch_dtype=torch.bfloat16).to("cuda")

?

pramishp · 2025-01-20T18:20:23Z

@sayakpaul , No ! I haven't tried this. But, this likely won't work. In my above comment, I was referring to the case with single control input for control models like full Depth and Canny.

yiyixuxu · 2025-01-21T03:13:34Z

hi @pramishp

I think flux control pipeline does not work in the "multi-control" manner, i.e. if you load 2 loras and pass 2 control image and expect the first lora use the first control image and second lora use the second control image.... I don't think it'll works that way based on my understanding
all pipeline do take multiple images (this one included) but the number of images have to have same length as prompt; (unless it is multi-controlnet, which this is not)

pramishp added the bug Something isn't working label Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Using Multiple Controls (Depth and Canny) with LoRA on FLUX.1-dev Model #10594

Issue with Using Multiple Controls (Depth and Canny) with LoRA on FLUX.1-dev Model #10594

pramishp commented Jan 16, 2025 •

edited by a-r-r-o-w

Loading

sayakpaul commented Jan 16, 2025

a-r-r-o-w commented Jan 16, 2025

pramishp commented Jan 16, 2025

pramishp commented Jan 16, 2025

sayakpaul commented Jan 16, 2025

pramishp commented Jan 18, 2025

sayakpaul commented Jan 18, 2025

pramishp commented Jan 20, 2025

yiyixuxu commented Jan 21, 2025

Issue with Using Multiple Controls (Depth and Canny) with LoRA on FLUX.1-dev Model #10594

Issue with Using Multiple Controls (Depth and Canny) with LoRA on FLUX.1-dev Model #10594

Comments

pramishp commented Jan 16, 2025 • edited by a-r-r-o-w Loading

Describe the bug

Expected Behavior

Observed Behavior

Reproduction

System Info

Who can help?

sayakpaul commented Jan 16, 2025

a-r-r-o-w commented Jan 16, 2025

pramishp commented Jan 16, 2025

pramishp commented Jan 16, 2025

sayakpaul commented Jan 16, 2025

pramishp commented Jan 18, 2025

sayakpaul commented Jan 18, 2025

pramishp commented Jan 20, 2025

yiyixuxu commented Jan 21, 2025

pramishp commented Jan 16, 2025 •

edited by a-r-r-o-w

Loading