Moving a pipeline that has a quantized component, to cuda, causes an error #9953

Leommm-byte · 2024-11-18T19:12:55Z

Describe the bug

After trying out the new quantization method added to the diffusers library, I encountered a bug. I could not move the pipeline to cuda as I got this error

Traceback (most recent call last):
  File "/workspace/test.py", line 12, in <module>
    pipe.to("cuda")
  File "/usr/local/lib/python3.11/dist-packages/diffusers/pipelines/pipeline_utils.py", line 414, in to
    raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.

Reproduction

from diffusers import FluxPipeline, FluxTransformer2DModel
from transformers import T5EncoderModel


transformer = FluxTransformer2DModel.from_pretrained("cozy-creator/Flux.1-schnell-8bit", torch_dtype=torch.bfloat16, subfolder="transformer")
text_encoder_2 = T5EncoderModel.from_pretrained("cozy-creator/Flux.1-schnell-8bit", torch_dtype=torch.bfloat16, subfolder="text_encoder_2")
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", transformer=transformer, text_encoder_2=text_encoder_2, torch_dtype=torch.bfloat16)

pipe.to("cuda")

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=0.0,
    num_inference_steps=4,
    max_sequence_length=256,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev.png")

Logs

root@4e27fd69b49a:/workspace# python test.py
Fetching 2 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3650.40it/s]
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Downloading shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3533.53it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.44s/it]
Loading pipeline components...:  57%|████████████████████████████████████████████████████                                       | 4/7 [00:00<00:00, 20.54it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 15.46it/s]
Traceback (most recent call last):
  File "/workspace/test.py", line 10, in <module>
    pipe.to("cuda")
  File "/usr/local/lib/python3.11/dist-packages/diffusers/pipelines/pipeline_utils.py", line 414, in to
    raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.

System Info

🤗 Diffusers version: 0.32.0.dev0
Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
Running on Google Colab?: No
Python version: 3.11.10
PyTorch version (GPU?): 2.4.1+cu124 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.26.2
Transformers version: 4.47.0.dev0
Accelerate version: 1.1.1
PEFT version: not installed
Bitsandbytes version: 0.44.1
Safetensors version: 0.4.5
xFormers version: not installed
Accelerator: NVIDIA A100-SXM4-80GB, 81920 MiB
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help?

@sayakpaul

It's also worth noting that it doesn't crash if it is just transformer that is passed in. It just gives this warning
The module 'FluxTransformer2DModel' has been loaded in `bitsandbytes` 8bit and moving it to cuda via `.to()` is not supported. Module is still on cuda:0.

The text was updated successfully, but these errors were encountered:

sayakpaul · 2024-11-19T02:39:41Z

Fixed in #9840. Could you give this a check? Cc: @DN6

Leommm-byte added the bug Something isn't working label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving a pipeline that has a quantized component, to cuda, causes an error #9953

Moving a pipeline that has a quantized component, to cuda, causes an error #9953

Leommm-byte commented Nov 18, 2024 •

edited

Loading

sayakpaul commented Nov 19, 2024

Moving a pipeline that has a quantized component, to cuda, causes an error #9953

Moving a pipeline that has a quantized component, to cuda, causes an error #9953

Comments

Leommm-byte commented Nov 18, 2024 • edited Loading

Describe the bug

Reproduction

Logs

System Info

Who can help?

sayakpaul commented Nov 19, 2024

Leommm-byte commented Nov 18, 2024 •

edited

Loading