Fix `align_module_device`, ensure only cpu tensors for `get_state_dict_offloaded_model` #3217

kylesayrs · 2024-11-04T20:17:18Z

Background

Previous PR [Utils] align_module_device #3204 introduce an unintended behavior change where a module being aligned would also attempt to align parameters belonging to its submodules. This is a problem for functions like get_state_dict_offloaded_model which calls align_module_device on non-leaf modules.

tests/test_modeling_utils.py:808
    state_dict = get_state_dict_offloaded_model(model)
src/accelerate/utils/modeling.py:1532: in get_state_dict_offloaded_model                                                                                                             
    with align_module_device(module, "cpu"): 
/usr/lib/python3.10/contextlib.py:135: in __enter__                                                                                                                                  
    return next(self.gen)  
src/accelerate/utils/modeling.py:1929: in align_module_device                                                                                                                        
    set_module_tensor_to_device(module, name, execution_device)
ValueError: weight is on the meta device, we need a `value` to put in on cpu.

Purpose

Fix align_module_device bug where the function attempts to align meta tensors belonging to submodule parameters
Fix get_state_dict_offloaded_model(model) behavior to match model.state_dict()
Introduce tests for get_state_dict_offloaded_model

Changes

align_module_device now only aligns parameters directly attached to the parent
Move all tensors in module state dict to cpu before returning, including both parameters and buffers
Add usage tests for get_state_dict_offloaded_model

Testing

Added tests fail without changes, but pass with changes
Use below script to test end-to-end

test_e2e.py

from transformers import AutoModelForCausalLM
from accelerate import cpu_offload
from accelerate.utils.modeling import get_state_dict_offloaded_model

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
cpu_offload(model)

state_dict = get_state_dict_offloaded_model(model)

muellerzr

Thanks for fixing! cc @SunMarc :)

HuggingFaceDocBuilderDev · 2024-11-05T02:30:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

LGTM ! Thanks for fixing this !

…t_offloaded_model` (#3217) * only onload direct parameter descendants, move buffers to cpu, add tests * remove no longer applicable comment

kylesayrs added 2 commits November 4, 2024 19:54

only onload direct parameter descendants, move buffers to cpu, add tests

4ba6222

remove no longer applicable comment

4133795

kylesayrs marked this pull request as draft November 4, 2024 21:18

kylesayrs marked this pull request as ready for review November 4, 2024 21:19

kylesayrs mentioned this pull request Nov 4, 2024

Cap accelerate version to avoid bug vllm-project/llm-compressor#897

Merged

muellerzr approved these changes Nov 5, 2024

View reviewed changes

muellerzr requested a review from SunMarc November 5, 2024 02:26

SunMarc approved these changes Nov 5, 2024

View reviewed changes

SunMarc merged commit c0552c9 into huggingface:main Nov 5, 2024
25 checks passed

kylesayrs deleted the kylesayrs/align-module-device-fix branch November 5, 2024 15:28

muellerzr pushed a commit that referenced this pull request Nov 6, 2024

Fix align_module_device, ensure only cpu tensors for `get_state_dic…

cdde654

…t_offloaded_model` (#3217) * only onload direct parameter descendants, move buffers to cpu, add tests * remove no longer applicable comment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `align_module_device`, ensure only cpu tensors for `get_state_dict_offloaded_model` #3217

Fix `align_module_device`, ensure only cpu tensors for `get_state_dict_offloaded_model` #3217

kylesayrs commented Nov 4, 2024 •

edited

Loading

muellerzr left a comment

HuggingFaceDocBuilderDev commented Nov 5, 2024

SunMarc left a comment

Fix align_module_device, ensure only cpu tensors for get_state_dict_offloaded_model #3217

Fix align_module_device, ensure only cpu tensors for get_state_dict_offloaded_model #3217

Conversation

kylesayrs commented Nov 4, 2024 • edited Loading

Background

Purpose

Changes

Testing

muellerzr left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 5, 2024

SunMarc left a comment

Choose a reason for hiding this comment

Fix `align_module_device`, ensure only cpu tensors for `get_state_dict_offloaded_model` #3217

Fix `align_module_device`, ensure only cpu tensors for `get_state_dict_offloaded_model` #3217

kylesayrs commented Nov 4, 2024 •

edited

Loading