clear memory after offload #2994

SunMarc · 2024-08-06T16:14:29Z

What does this PR do ?

This PR clears the memory in CpuOffload after we offload the previous module to the cpu ( used a lot in cpu_offload_with_hook ). This helps a lot with cpu offloading in diffusers as this is more efficient with the VRAM usage.

cc @sayakpaul

sayakpaul

Thank you! @asomoza could you check if this branch of accelerate helps with the additional memory issue we were seeing with cpu offloading in FLUX?

HuggingFaceDocBuilderDev · 2024-08-06T16:31:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

asomoza · 2024-08-06T16:57:53Z

yeah, there's definitely an improvement but sadly not with flux which I suspect is a problem with us and not accelerate. But I tested it with SDXL and I can clearly see the difference:

First part is with the fix which never gets over 8GB of VRAM, second part I commented the fix and then the VRAM consumption goes over 14GB of VRAM.

sayakpaul · 2024-08-06T17:01:59Z

Thanks. Did it help at all with Flux? I guess then the next step for us would be record the memory invocations and compare it to ComfyUI to note the points of differences? I think would be nice to do and fix on a priority, wdyt?

asomoza · 2024-08-07T04:23:39Z

yeah, as discussed internally, this also helps with Flux when not using the quantized transformer and loaded after, so overall this helps all the pipelines to be more efficient with the VRAM usage.

SunMarc · 2024-08-07T11:37:50Z

Nice ! Then, I guess we can safely merge this PR ! cc @muellerzr

sayakpaul · 2024-08-07T11:51:21Z

Yes!

a-r-r-o-w · 2024-08-08T21:55:36Z

Thanks for the PR @SunMarc. We recently added CogVideoX to Diffusers and were facing some issues with total memory required for running inference. This PR seems to address those issues as well. The following is the code used for inference:

Code

import gc

import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video


def flush():
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.reset_max_memory_allocated()
    torch.cuda.reset_peak_memory_stats()


def bytes_to_giga_bytes(bytes):
    return f"{(bytes / 1024 / 1024 / 1024):.3f}"


flush()

prompt = (
    "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. "
    "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
    "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
    "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
    "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
    "atmosphere of this unique musical performance."
)

pipe = CogVideoXPipeline.from_pretrained("/raid/aryan/CogVideoX-trial", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]

torch.cuda.empty_cache()
memory = bytes_to_giga_bytes(torch.cuda.memory_allocated())
max_memory = bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
max_reserved = bytes_to_giga_bytes(torch.cuda.max_memory_reserved())
print(f"{memory=}")
print(f"{max_memory=}")
print(f"{max_reserved=}")

export_to_video(video, "output.mp4", fps=8)

Without pipe.enable_model_cpu_offload(), the overall reserved memory allocated was ~33 GB (14 GB for models and remaining for denoising and decoding) but peak memory was ~22 GB. So, if we disabled cache allocations with PYTORCH_NO_CUDA_MEMORY_CACHING=1, the pipeline would run in about 22 GB, but the inference is extremely slow.

With pipe.enable_model_cpu_offload() and accelerate:main, the overall reserved memory was 27 GB. Still not ideal but better.

With pipe.enable_model_cpu_offload() and this branch, we get:

memory='0.008'
max_memory='10.805'
max_reserved='18.061'

which is consistent with the original implementation as reported here. Thanks again :)

cc @zRzRzRzRzRzRzR

user425846 · 2024-08-08T23:30:46Z

the quick progress here is amazing, i would love if this could be merged soon, looking forward to use CogVideo on 24gb

muellerzr

Nice!

clear memory after offload

869eeb4

sayakpaul reviewed Aug 6, 2024

View reviewed changes

fix circular import

d1a112d

to be safe

d47a1b8

SunMarc requested a review from muellerzr August 7, 2024 11:37

sayakpaul mentioned this pull request Aug 8, 2024

Flux-fp8 needs "from_single_file" support in the FluxPipeline huggingface/diffusers#9053

Closed

a-r-r-o-w mentioned this pull request Aug 8, 2024

CUDA OOM and possible solution -- diffusers cli_demo.py with Nvidia 3090 24GB THUDM/CogVideo#92

Closed

2 tasks

muellerzr approved these changes Aug 9, 2024

View reviewed changes

SunMarc merged commit 12a5bef into main Aug 9, 2024
28 checks passed

SunMarc deleted the test-clear-memory-cpu-offload branch August 9, 2024 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clear memory after offload #2994

clear memory after offload #2994

SunMarc commented Aug 6, 2024 •

edited

Loading

sayakpaul left a comment

HuggingFaceDocBuilderDev commented Aug 6, 2024

asomoza commented Aug 6, 2024

sayakpaul commented Aug 6, 2024

asomoza commented Aug 7, 2024

SunMarc commented Aug 7, 2024

sayakpaul commented Aug 7, 2024

a-r-r-o-w commented Aug 8, 2024

user425846 commented Aug 8, 2024

muellerzr left a comment

clear memory after offload #2994

clear memory after offload #2994

Conversation

SunMarc commented Aug 6, 2024 • edited Loading

What does this PR do ?

sayakpaul left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 6, 2024

asomoza commented Aug 6, 2024

sayakpaul commented Aug 6, 2024

asomoza commented Aug 7, 2024

SunMarc commented Aug 7, 2024

sayakpaul commented Aug 7, 2024

a-r-r-o-w commented Aug 8, 2024

user425846 commented Aug 8, 2024

muellerzr left a comment

Choose a reason for hiding this comment

SunMarc commented Aug 6, 2024 •

edited

Loading