Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) #1453
Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) #1453
Changes from 4 commits
fe26423
8864f91
e66aedc
350a7a0
b37a4f0
f383947
b2a1c95
d8678fa
a0faca0
bf20c90
a353017
c51ba46
ff01f5c
13093eb
3cdcbdf
64d74d1
e30e6cc
f19593d
f72bc78
660df05
4f9477f
818651c
f681201
bb34f90
9f3b504
1772d41
5f213d8
3198805
2225659
5aa9bd5
a1a60ed
df41659
ab3a82a
025a41e
6f9415b
8d12281
a26d1fb
937f04e
e74f509
6ee558d
14e22a2
16281df
38af6b1
cc7011e
806b072
bf0dd66
73207ae
d3ecb1f
812fe67
6dc21a6
2a38302
16f1d21
11d590a
2b5f6ea
80b53d3
6dfe693
912e6f0
4b0133b
b998206
d6deecb
2a4ef29
90182b6
ad847ed
f590ba4
9db815f
259ec15
96d2247
2630b75
79fd42c
8565e04
06eab1a
0f8affe
3436422
601d1f1
5dcee36
580d25e
872f451
e236293
43b3b83
83905ac
31aecfc
8736700
516379d
0bf7bcd
43c00ff
4574bc7
e04dc6a
391cecf
3d26469
536d171
19f3538
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: bfloat16 - I proposed some month back that we don't create too many config entries, but instead switch to using a new
dtype
block, where the user can flip from bf16 to fp16 to fp32Especially since they are mutually-exclusive.
But that discussion wasn't concluded. Now it's a good time to bring it back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds reasonable. note that this change will be published as part of a separate PR, i just made my changes on top of it so it ended up in here. will rebase once that PR makes it in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please note that I'm just a contributor, so my suggestions are just that - suggestions. Therefore in order not to waste your time, please first secure an agreement from the Deepspeed team when it comes to changing APIs.
In particular this one as it'd require back-compat code to support the current config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raamjad, in context of #1398, if you could add this
dtype
block, perhaps as a follow up PR, that would be great. Thanks!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the current coverage of bf16 is very limited, I thought it may be better to do this refactoring of config later once there is higher coverage.
Do you prefer that this be done now?
@stas00 Can you point me to your comments/where you suggested about the dtype block so I know what shape of config changes were suggested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of my proposals was very simple: it's adding a new top-level config
dtype
and droppingenabled
from fp16 config, and adding bf16 block.This approach allows users to keep nuanced settings for each dtype in the same config, but
dtype
will enable one over the others, so one can easily switch the settings in only one place.It hasn't been approved by the Deepspeed team (as in no decision has been made about it).
@tjruwase, if you have access to the Teams log from Apr-30 this is when we discussed this. But it won't show it me - the search only shows a snippet. search for 'dtype mockup'.
the other proposal is to have a single dtype block that will take over the fp16 block. This would be useful if many of the config options of bf16 and fp16 overlap. So:
and for bf16:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a maintainer of DS integration in HF Transformers, I find that it's the easiest when users can download ready-to-use config files, so in my experience having all the config sections already predefined in the config file makes it easier for the user. So the first approach would be preferable for that particular use case.
But of course there can be other ways...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stas00, thanks for reviving your awesome proposals. Sorry, mere mortals such as we can barely keep up with your genius :).
@stas00, @raamjad is it okay, if we continue this chat on #1398? I will add link to this thread and also post the referenced teams chat. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're too kind, @tjruwase - it's easy to come up with ideas, it's far from easy to make them a reality. So that's where your genius comes in ;)
Thank you for find that old discussion and repasting it here, as MSFT Teams won't let me access it.