Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix serialization for offloaded model #31727

Merged
merged 4 commits into from
Jul 5, 2024
Merged

Fix serialization for offloaded model #31727

merged 4 commits into from
Jul 5, 2024

Conversation

SunMarc
Copy link
Member

@SunMarc SunMarc commented Jul 1, 2024

What does this PR do ?

This PR fixes the serialization of offloaded model. With this PR, the logic behind saving offloaded model is triggered only when we use more than one device that contain either "cpu" or "disk" device. Moreover, we fix a small bug that was introduced with this PR. We rename the state_dict variable to shard_state_dict. I also fixed the test test_save_offloaded_model.

Example:

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", device_map={"":"cpu"})
model.save_pretrained("llama")

Fixes #31685

@SunMarc SunMarc requested a review from LysandreJik July 1, 2024 10:09
@LysandreJik
Copy link
Member

Could you implement a test that ensures that the previous faulty behavior has been fixed? :)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SunMarc
Copy link
Member Author

SunMarc commented Jul 1, 2024

Could you implement a test that ensures that the previous faulty behavior has been fixed? :)

Done !

Do you know where I can see the results of the test for test_modeling_utils.py file @ydshieh ? I didn't see the following test test_save_offloaded_model in the daily report. On my local machine, it was failing so I was expecting to see it in the report.

@ydshieh
Copy link
Collaborator

ydshieh commented Jul 1, 2024

Hi @SunMarc Let me check, but we might have a problem of some test files not included 😭

@SunMarc
Copy link
Member Author

SunMarc commented Jul 1, 2024

Hi @SunMarc Let me check, but we might have a problem of some test files not included 😭

Sounds good ! Maybe we can put all the utils python file into a separate folder to not have them mixed with the common test files.

@ydshieh
Copy link
Collaborator

ydshieh commented Jul 1, 2024

I confirmed that we have some tests not running on daily CI.
(We also have push-ci where we try to collect the impacted files, but for daily CI, tests are collected from the subfolders of tests)

Maybe we can put all the utils python file into a separate folder to not have them mixed with the common test files.

Sound a quick solution. It's just we have to make sure there is not subclass of TestCase in other files under tests. But for now let's do what you suggested above.

@SunMarc
Copy link
Member Author

SunMarc commented Jul 1, 2024

Sound a quick solution. It's just we have to make sure there is not subclass of TestCase in other files under tests. But for now let's do what you suggested above.

Great !

@ydshieh
Copy link
Collaborator

ydshieh commented Jul 1, 2024

Here it is #31730 but I have to see how CI is going.

@SunMarc SunMarc requested a review from amyeroberts July 3, 2024 11:47
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the changes @SunMarc

@LysandreJik LysandreJik merged commit 8c5c180 into main Jul 5, 2024
18 of 22 checks passed
@LysandreJik LysandreJik deleted the fix-serialization branch July 5, 2024 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NameError: free variable 'state_dict' referenced before assignment in enclosing scope
4 participants