You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After trying out the new quantization method added to the diffusers library, I encountered a bug. I could not move the pipeline to cuda as I got this error
Traceback (most recent call last):
File "/workspace/test.py", line 12, in <module>
pipe.to("cuda")
File "/usr/local/lib/python3.11/dist-packages/diffusers/pipelines/pipeline_utils.py", line 414, in to
raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.
Reproduction
from diffusers import FluxPipeline, FluxTransformer2DModel
from transformers import T5EncoderModel
transformer = FluxTransformer2DModel.from_pretrained("cozy-creator/Flux.1-schnell-8bit", torch_dtype=torch.bfloat16, subfolder="transformer")
text_encoder_2 = T5EncoderModel.from_pretrained("cozy-creator/Flux.1-schnell-8bit", torch_dtype=torch.bfloat16, subfolder="text_encoder_2")
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", transformer=transformer, text_encoder_2=text_encoder_2, torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=0.0,
num_inference_steps=4,
max_sequence_length=256,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev.png")
Logs
root@4e27fd69b49a:/workspace# python test.py
Fetching 2 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3650.40it/s]
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in<class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Downloading shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3533.53it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.44s/it]
Loading pipeline components...: 57%|████████████████████████████████████████████████████ | 4/7 [00:00<00:00, 20.54it/s]You set`add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 15.46it/s]
Traceback (most recent call last):
File "/workspace/test.py", line 10, in<module>
pipe.to("cuda")
File "/usr/local/lib/python3.11/dist-packages/diffusers/pipelines/pipeline_utils.py", line 414, in to
raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.
It's also worth noting that it doesn't crash if it is just transformer that is passed in. It just gives this warning The module 'FluxTransformer2DModel' has been loaded in `bitsandbytes` 8bit and moving it to cuda via `.to()` is not supported. Module is still on cuda:0.
The text was updated successfully, but these errors were encountered:
Describe the bug
After trying out the new quantization method added to the diffusers library, I encountered a bug. I could not move the pipeline to cuda as I got this error
Reproduction
Logs
System Info
Who can help?
@sayakpaul
It's also worth noting that it doesn't crash if it is just transformer that is passed in. It just gives this warning
The module 'FluxTransformer2DModel' has been loaded in `bitsandbytes` 8bit and moving it to cuda via `.to()` is not supported. Module is still on cuda:0.
The text was updated successfully, but these errors were encountered: