Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving a pipeline that has a quantized component, to cuda, causes an error #9953

Open
Leommm-byte opened this issue Nov 18, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Leommm-byte
Copy link
Contributor

Leommm-byte commented Nov 18, 2024

Describe the bug

After trying out the new quantization method added to the diffusers library, I encountered a bug. I could not move the pipeline to cuda as I got this error

Traceback (most recent call last):
  File "/workspace/test.py", line 12, in <module>
    pipe.to("cuda")
  File "/usr/local/lib/python3.11/dist-packages/diffusers/pipelines/pipeline_utils.py", line 414, in to
    raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.

Reproduction

from diffusers import FluxPipeline, FluxTransformer2DModel
from transformers import T5EncoderModel


transformer = FluxTransformer2DModel.from_pretrained("cozy-creator/Flux.1-schnell-8bit", torch_dtype=torch.bfloat16, subfolder="transformer")
text_encoder_2 = T5EncoderModel.from_pretrained("cozy-creator/Flux.1-schnell-8bit", torch_dtype=torch.bfloat16, subfolder="text_encoder_2")
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", transformer=transformer, text_encoder_2=text_encoder_2, torch_dtype=torch.bfloat16)

pipe.to("cuda")

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=0.0,
    num_inference_steps=4,
    max_sequence_length=256,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev.png")

Logs

root@4e27fd69b49a:/workspace# python test.py
Fetching 2 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3650.40it/s]
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Downloading shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3533.53it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.44s/it]
Loading pipeline components...:  57%|████████████████████████████████████████████████████                                       | 4/7 [00:00<00:00, 20.54it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 15.46it/s]
Traceback (most recent call last):
  File "/workspace/test.py", line 10, in <module>
    pipe.to("cuda")
  File "/usr/local/lib/python3.11/dist-packages/diffusers/pipelines/pipeline_utils.py", line 414, in to
    raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.

System Info

  • 🤗 Diffusers version: 0.32.0.dev0
  • Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
  • Running on Google Colab?: No
  • Python version: 3.11.10
  • PyTorch version (GPU?): 2.4.1+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.26.2
  • Transformers version: 4.47.0.dev0
  • Accelerate version: 1.1.1
  • PEFT version: not installed
  • Bitsandbytes version: 0.44.1
  • Safetensors version: 0.4.5
  • xFormers version: not installed
  • Accelerator: NVIDIA A100-SXM4-80GB, 81920 MiB
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

@sayakpaul

It's also worth noting that it doesn't crash if it is just transformer that is passed in. It just gives this warning
The module 'FluxTransformer2DModel' has been loaded in `bitsandbytes` 8bit and moving it to cuda via `.to()` is not supported. Module is still on cuda:0.

@Leommm-byte Leommm-byte added the bug Something isn't working label Nov 18, 2024
@sayakpaul
Copy link
Member

Fixed in #9840. Could you give this a check? Cc: @DN6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants