Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) #1453
Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) #1453
Changes from all commits
fe26423
8864f91
e66aedc
350a7a0
b37a4f0
f383947
b2a1c95
d8678fa
a0faca0
bf20c90
a353017
c51ba46
ff01f5c
13093eb
3cdcbdf
64d74d1
e30e6cc
f19593d
f72bc78
660df05
4f9477f
818651c
f681201
bb34f90
9f3b504
1772d41
5f213d8
3198805
2225659
5aa9bd5
a1a60ed
df41659
ab3a82a
025a41e
6f9415b
8d12281
a26d1fb
937f04e
e74f509
6ee558d
14e22a2
16281df
38af6b1
cc7011e
806b072
bf0dd66
73207ae
d3ecb1f
812fe67
6dc21a6
2a38302
16f1d21
11d590a
2b5f6ea
80b53d3
6dfe693
912e6f0
4b0133b
b998206
d6deecb
2a4ef29
90182b6
ad847ed
f590ba4
9db815f
259ec15
96d2247
2630b75
79fd42c
8565e04
06eab1a
0f8affe
3436422
601d1f1
5dcee36
580d25e
872f451
e236293
43b3b83
83905ac
31aecfc
8736700
516379d
0bf7bcd
43c00ff
4574bc7
e04dc6a
391cecf
3d26469
536d171
19f3538
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could someone kindly help explain which situations were required by this 16bit parameters gathering (infeeicient) feature, given that there is zero_to_fp32.py script which can help save the parameters? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for those who can't be bothered with running
zero_to_fp32.py
and want the 16-bit model extracted on the fly - which is fine for tiny to small models but very slow for large models.It's also the default in the HF Trainer integration of Deepspeed to make it easy for users to start and have things work transparently. But the documentation explains how to improve upon this default.
https://huggingface.co/docs/transformers/main/main_classes/deepspeed#getting-the-model-weights-out