save_pretrained: use tqdm when saving checkpoint shards from offloaded params #31856

kallewoof · 2024-07-09T09:56:47Z

Even at 9B, doing a save with partially CPU offloaded tensors can take up to 30 seconds. Doing a 70B model can take 5-10 minutes depending on how much VRAM you have around. This adds a progress bar to give the user some visual feedback whenever there are offloaded params.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

…d params

amyeroberts

Thanks for adding! LGTM as we do the same for loading checkpoint shards

save_pretrained: use tqdm when saving checkpoint shards from offloade…

75aae96

…d params

amyeroberts approved these changes Jul 9, 2024

View reviewed changes

amyeroberts merged commit cffa2b9 into huggingface:main Jul 9, 2024
20 checks passed

kallewoof deleted the 202407-save-pretrained-tqdm branch July 9, 2024 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

save_pretrained: use tqdm when saving checkpoint shards from offloaded params #31856

save_pretrained: use tqdm when saving checkpoint shards from offloaded params #31856

kallewoof commented Jul 9, 2024 •

edited

Loading

amyeroberts left a comment

save_pretrained: use tqdm when saving checkpoint shards from offloaded params #31856

save_pretrained: use tqdm when saving checkpoint shards from offloaded params #31856

Conversation

kallewoof commented Jul 9, 2024 • edited Loading

Before submitting

amyeroberts left a comment

Choose a reason for hiding this comment

kallewoof commented Jul 9, 2024 •

edited

Loading