-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use reduce-scatter coalescing for FSDP #6024
Conversation
I guess we need to rebase to the master once the dependent PR is landed? |
dd3bdac
to
1bfae0e
Compare
I had rebased over the dependent PR #5956 |
1bfae0e
to
66636c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change makes a lot of sense to support coalescing reduce-scatter. Just one question, what if I don't need this feature and want to preserve the initial behavior where the reduce-scatter is fired immediately?
I wish I have the resources to perform through-out performance tests in TPU but...
Therefore, will it be possible to add this as an optional feature?
f06545d
to
3dce325
Compare
Let me know when it's ready for review? |
3dce325
to
aac286b
Compare
@alanwaketan can you take a look at this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This PR uses reduce-scatter coalescence in FSDP in addition to reduce-scatter's scale param. This PR is companion to #5950 and #5956 and to be used in conjunction with openxla openxla/xla#5740 .
This is a revival of #4145.