Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] OSS restore state to proper device #46

Merged
merged 5 commits into from
Aug 20, 2020

Conversation

blefaudeux
Copy link
Contributor

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes the state pull and push behavior to being something more easily predictable, when a state is pulled it gets moved to CPU (failsafe if this is a reasonably big model). Upon restore the param_groups were not restored to the proper device. It looks like there are still probable duplicates in that field though

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@blefaudeux blefaudeux self-assigned this Aug 20, 2020
@blefaudeux blefaudeux requested a review from msbaines August 20, 2020 01:09
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 20, 2020
@blefaudeux blefaudeux requested a review from min-xu-ai August 20, 2020 01:09
@codecov
Copy link

codecov bot commented Aug 20, 2020

Codecov Report

Merging #46 into master will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #46   +/-   ##
=======================================
  Coverage   94.18%   94.18%           
=======================================
  Files          35       35           
  Lines        2065     2065           
=======================================
  Hits         1945     1945           
  Misses        120      120           
Flag Coverage Δ
#Python 94.18% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
fairscale/optim/oss.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d6c7b6...7120d57. Read the comment docs.

@@ -148,7 +148,7 @@ def load_state_dict(self, state_dict: Dict[str, Any]) -> None:
self.load_local_state_dict(state_dict["state"][self.rank])

# Restore the global param_groups
self.param_groups = state_dict["param_groups"]
self.param_groups = recursive_copy_to_device(state_dict["param_groups"], non_blocking=True, device=self._device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line too long?

Is there a test for this already? If so, should there be some assert in the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no test, it's a good point, I'll add that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@blefaudeux blefaudeux Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's supposed to be automatic with vscode+black, it's pure FB tooling and set to 120 cols, this formatting crap is so annoying.. I'll fix that in the next PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah no ok, black is happy with that indeed, it's 121 cols but for some reason this passes (same for the init above, also 121). Anyway, all good then

Copy link
Contributor

@min-xu-ai min-xu-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@blefaudeux blefaudeux merged commit c2d6f4b into master Aug 20, 2020
@blefaudeux blefaudeux deleted the oss_restore_device_state branch September 2, 2020 17:43
myleott pushed a commit that referenced this pull request Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants