[bitsandbytes] allow directly CUDA placements of pipelines loaded with bnb components #9840

sayakpaul · 2024-11-02T04:14:41Z

What does this PR do?

When a pipeline is loaded with models that have quantization config, we should still be able to call to("cuda") on the pipeline object. For GPUs that would allow the memory (such as a 4090), this has performance benefits (as demonstrated below).

Model CPU Offload	Batch Size	Time (seconds)	Memory (GB)
False	1	19.316	14.935
True	1	36.746	12.139
False	4	80.665	20.576
True	4	98.612	12.138

Flux.1 Dev, steps: 30

Currently, calling to("cuda") is not possible because:

from transformers import T5EncoderModel
from transformers import BitsAndBytesConfig as BnbConfig
import torch 

ckpt_id = "black-forest-labs/FLUX.1-dev"

text_encoder_2_config = BnbConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
text_encoder_2 = T5EncoderModel.from_pretrained(
    ckpt_id,
    subfolder="text_encoder_2",
    quantization_config=text_encoder_2_config,
    torch_dtype=torch.bfloat16
)
print(text_encoder_2._hf_hook)

has:

AlignDevicesHook(execution_device=0, offload=False, io_same_device=True, offload_buffers=False, place_submodules=True, skip_keys=None)

This is why this line complains:

diffusers/src/diffusers/pipelines/pipeline_utils.py

Line 413 in c10f875

    
           if pipeline_is_sequentially_offloaded and device and torch.device(device).type == "cuda":

This PR fixes that behavior.

Benchmarking code:

Unroll

from diffusers import DiffusionPipeline, FluxTransformer2DModel, BitsAndBytesConfig
from transformers import T5EncoderModel
from transformers import BitsAndBytesConfig as BnbConfig
import torch.utils.benchmark as benchmark
import torch 
import fire

def benchmark_fn(f, *args, **kwargs):
    t0 = benchmark.Timer(
        stmt="f(*args, **kwargs)",
        globals={"args": args, "kwargs": kwargs, "f": f},
        num_threads=torch.get_num_threads(),
    )
    return f"{(t0.blocked_autorange().mean):.3f}"

def load_pipeline(model_cpu_offload=False):
    ckpt_id = "black-forest-labs/FLUX.1-dev"

    transformer_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    )
    transformer = FluxTransformer2DModel.from_pretrained(
        ckpt_id, 
        subfolder="transformer",
        quantization_config=transformer_config,
        torch_dtype=torch.bfloat16
    )

    text_encoder_2_config = BnbConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    )
    text_encoder_2 = T5EncoderModel.from_pretrained(
        ckpt_id,
        subfolder="text_encoder_2",
        quantization_config=text_encoder_2_config,
        torch_dtype=torch.bfloat16
    )

    pipeline = DiffusionPipeline.from_pretrained(
        "black-forest-labs/FLUX.1-dev",
        text_encoder_2=text_encoder_2,
        transformer=transformer,
        torch_dtype=torch.bfloat16,
    )
    if model_cpu_offload:
        pipeline.enable_model_cpu_offload()
    else:
        pipeline = pipeline.to("cuda")

    pipeline.set_progress_bar_config(disable=True)
    return pipeline

def run_pipeline(pipeline, batch_size=1):
    _ = pipeline(
        prompt="a dog sitting besides a sea", 
        guidance_scale=3.5, 
        max_sequence_length=512, 
        num_inference_steps=30,
        num_images_per_prompt=batch_size
    )


def main(batch_size: int = 1, model_cpu_offload: bool = False):
    pipeline = load_pipeline(model_cpu_offload=model_cpu_offload)

    for _ in range(5):
        run_pipeline(pipeline)

    time = benchmark_fn(run_pipeline, pipeline, batch_size)
    memory = torch.cuda.max_memory_allocated() / 1024 / 1024 / 1024
    print(f"{model_cpu_offload=}, {batch_size=} {time=} seconds {memory=} GB.")

    image = pipeline(
        prompt="a dog sitting besides a sea", 
        guidance_scale=3.5, 
        max_sequence_length=512, 
        num_inference_steps=30,
        num_images_per_prompt=1
    ).images[0]
    img_name = f"mco@{model_cpu_offload}-bs@{batch_size}.png"
    image.save(img_name)


if __name__ == "__main__":
    fire.Fire(main)

HuggingFaceDocBuilderDev · 2024-11-02T04:21:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

Thanks for the PR ! Left a suggestion

src/diffusers/pipelines/pipeline_utils.py

sayakpaul · 2024-11-05T15:22:49Z

@SunMarc WDYT now?

SunMarc

Thanks for adding this ! LGTM ! I'll marge the PR on accelerate also

sayakpaul · 2024-11-16T13:26:07Z

Have run the integration tests and they are passing.

SunMarc · 2024-11-18T14:44:43Z

Have run the integration tests and they are passing.
On diffusers ?

sayakpaul · 2024-11-18T14:45:45Z

@SunMarc yes, on diffusers. Anywhere else they need to be run?

SunMarc · 2024-11-18T15:03:57Z

No, I read that as a question, my bad ;)

yiyixuxu · 2024-11-19T07:24:51Z

src/diffusers/pipelines/pipeline_utils.py

+                # For `diffusers` it should not be a problem as we enforce the installation of a bnb version
+                # that already supports CPU placements.
+                else:
+                    module.to(device=device)


ok but this means for diffusers the transformer version would always met the requirement, no? i.e. the check is_transformers_version(">", "4.44.0") will aways pass

Agree but for the diffusers codepath, probably don't care about the transformer version, no?

Anything I am missing?

sayakpaul added 4 commits November 1, 2024 22:00

allow device placement when using bnb quantization.

35b4cf2

warning.

ec4d422

tests

2afa9b0

Merge branch 'main' into allow-device-placement-bnb

3679ebd

sayakpaul requested review from DN6, yiyixuxu and SunMarc November 2, 2024 04:34

SunMarc reviewed Nov 4, 2024

View reviewed changes

src/diffusers/pipelines/pipeline_utils.py Outdated Show resolved Hide resolved

sayakpaul requested a review from matthewdouglas November 5, 2024 08:23

sayakpaul added 3 commits November 5, 2024 15:52

fixes

79633ee

Merge branch 'main' into allow-device-placement-bnb

876cd13

Merge branch 'main' into allow-device-placement-bnb

a28c702

sayakpaul added 3 commits November 5, 2024 22:58

Merge branch 'main' into allow-device-placement-bnb

ad1584d

Merge branch 'main' into allow-device-placement-bnb

34d0925

Merge branch 'main' into allow-device-placement-bnb

d713c41

sayakpaul requested a review from SunMarc November 11, 2024 11:33

Merge branch 'main' into allow-device-placement-bnb

e9ef6ea

SunMarc approved these changes Nov 15, 2024

View reviewed changes

sayakpaul added 2 commits November 16, 2024 18:37

Merge branch 'main' into allow-device-placement-bnb

6ce560e

docs.

329b32e

Merge branch 'main' into allow-device-placement-bnb

2f6b07d

sayakpaul mentioned this pull request Nov 19, 2024

Moving a pipeline that has a quantized component, to cuda, causes an error #9953

Open

yiyixuxu reviewed Nov 19, 2024

View reviewed changes

yiyixuxu added the close-to-merge label Nov 19, 2024

sayakpaul added 3 commits November 19, 2024 12:56

Merge branch 'main' into allow-device-placement-bnb

fdeb500

require accelerate version.

53bc502

remove print.

f81b71e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitsandbytes] allow directly CUDA placements of pipelines loaded with bnb components #9840

[bitsandbytes] allow directly CUDA placements of pipelines loaded with bnb components #9840

sayakpaul commented Nov 2, 2024

HuggingFaceDocBuilderDev commented Nov 2, 2024

SunMarc left a comment

sayakpaul commented Nov 5, 2024

SunMarc left a comment

sayakpaul commented Nov 16, 2024

SunMarc commented Nov 18, 2024

sayakpaul commented Nov 18, 2024

SunMarc commented Nov 18, 2024

yiyixuxu Nov 19, 2024

sayakpaul Nov 19, 2024

[bitsandbytes] allow directly CUDA placements of pipelines loaded with bnb components #9840

Are you sure you want to change the base?

[bitsandbytes] allow directly CUDA placements of pipelines loaded with bnb components #9840

Conversation

sayakpaul commented Nov 2, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Nov 2, 2024

SunMarc left a comment

Choose a reason for hiding this comment

sayakpaul commented Nov 5, 2024

SunMarc left a comment

Choose a reason for hiding this comment

sayakpaul commented Nov 16, 2024

SunMarc commented Nov 18, 2024

sayakpaul commented Nov 18, 2024

SunMarc commented Nov 18, 2024

yiyixuxu Nov 19, 2024

Choose a reason for hiding this comment

sayakpaul Nov 19, 2024

Choose a reason for hiding this comment