-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc bug fixes in Zero optimizer: handling differentiable argument, optimizer_dtype #6454
Conversation
# Here we pop the differentiable default because the adam family of | ||
# optimizers don't have differentiable as an argument. This should | ||
# be fixed by this commit https://github.com/pytorch/pytorch/pull/86183 | ||
# and should be available in torch==2.0. For 1.13, we are patching it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @amithrm , here it says "This should be fixed by this commit pytorch/pytorch#86183 and should be available in torch==2.0." Can you remove this patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeffhataws PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a test case to cover the change?
pin_layout=self.pin_layout, | ||
groups=self.sharding_groups, | ||
) | ||
sharded_data.append(shard_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this gathering all the parameters into one bucket?
The changes here should already be in #6025 , as confirmed by Guangtai. |
This a cumulative PR with misc bug fixes and updates to Zero Redundancy Optimizer from all the authors (AWS): Guangtai Huang, Rahul Solanki, Fei Wu, Amith Mamidala