Optimizer state scaling #44

andersonic · 2020-08-17T22:46:57Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Allows scaling of optimizer state.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2020-08-17T23:15:47Z

Codecov Report

Merging #44 into master will increase coverage by 0.09%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #44      +/-   ##
==========================================
+ Coverage   94.18%   94.28%   +0.09%     
==========================================
  Files          35       35              
  Lines        2065     2100      +35     
==========================================
+ Hits         1945     1980      +35     
  Misses        120      120

Flag	Coverage Δ
#Python	`94.28% <100.00%> (+0.09%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
fairscale/optim/adam.py	`94.66% <100.00%> (+1.62%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e8c2a9...d926510. Read the comment docs.

sidgoyal78

Thanks for the PR. I have a few inline comments/questions.

fairscale/clib/fused_adam_cuda/fused_adam_cuda_kernel.cu

fairscale/optim/adam.py

This reverts commit 7f66480.

sidgoyal78

Thanks for the changes. Looks good to me!

shruti-bh

LGTM overall!

shruti-bh · 2020-08-21T22:21:12Z

benchmarks/transformer.py

@@ -236,10 +236,10 @@ def benchmark_language_model(train_data, val_data, test_data, model, criterion,

        # Assert that memory usage on each GPU is within 10% of golden run
        # Right-hand-side is golden run bytes * 110%
-        assert torch.cuda.memory_stats(0)["allocated_bytes.all.peak"] < 210479616 * 1.1
+        assert torch.cuda.memory_stats(0)["allocated_bytes.all.peak"] < 193206272 * 1.1


how did you come up with these number of bytes?

I had the same question in one of the previous PRs :D
I guess Jun-Ru printed out the value of torch.cuda.memory_stats(0)["allocated_bytes.all.peak"] and put that number in the check!

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 17, 2020

andersonic changed the base branch from add-fp16-optimizer-state to master August 19, 2020 00:01

andersonic force-pushed the optimizer-state-scaling branch 2 times, most recently from 744713f to 0eb5ad3 Compare August 19, 2020 22:33

andersonic marked this pull request as ready for review August 19, 2020 22:49

andersonic requested review from shruti-bh, msbaines, sidgoyal78 and blefaudeux August 19, 2020 22:49

andersonic marked this pull request as draft August 19, 2020 22:51

andersonic marked this pull request as ready for review August 20, 2020 16:21

Jun Ru Anderson added 12 commits August 21, 2020 10:26

add scaling for optimizer state

f3e8905

linting mypy etc

bae3c07

add state dict methods

52389b2

update benchmark to reflect lower memory usage

8310cc0

changes necessitated by rebasing

3d270f3

precommit

fa7a32f

lower benchmark learning rate

b3f4163

add in scale updating

e0efc89

fix scale update

690d15e

precommit

850cdfb

improve test coverage

9a50969

improve one comment; go codecov go

fce8747

andersonic force-pushed the optimizer-state-scaling branch from 8295557 to fce8747 Compare August 21, 2020 17:30

add an assert

2009570

sidgoyal78 reviewed Aug 21, 2020

View reviewed changes

fairscale/clib/fused_adam_cuda/fused_adam_cuda_kernel.cu Show resolved Hide resolved

fairscale/optim/adam.py Show resolved Hide resolved

fairscale/optim/adam.py Show resolved Hide resolved

Jun Ru Anderson added 3 commits August 21, 2020 13:39

remove scale_optim bool

7f66480

Revert "remove scale_optim bool"

e6fd8e6

This reverts commit 7f66480.

improve boolean name

d926510

andersonic requested a review from sidgoyal78 August 21, 2020 22:41

sidgoyal78 approved these changes Aug 22, 2020

View reviewed changes

shruti-bh approved these changes Aug 22, 2020

View reviewed changes

andersonic merged commit 5251a69 into master Aug 22, 2020

andersonic deleted the optimizer-state-scaling branch August 22, 2020 00:14

myleott pushed a commit that referenced this pull request Feb 22, 2021

move_grads_to_cpu defaults to same value as cpu_offload (#44)

bc7e337

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizer state scaling #44

Optimizer state scaling #44

andersonic commented Aug 17, 2020

codecov bot commented Aug 17, 2020 •

edited

Loading

sidgoyal78 left a comment

sidgoyal78 left a comment

shruti-bh left a comment

shruti-bh Aug 21, 2020

sidgoyal78 Aug 22, 2020

Optimizer state scaling #44

Optimizer state scaling #44

Conversation

andersonic commented Aug 17, 2020

Before submitting

What does this PR do?

PR review

Did you have fun?

codecov bot commented Aug 17, 2020 • edited Loading

Codecov Report

sidgoyal78 left a comment

Choose a reason for hiding this comment

sidgoyal78 left a comment

Choose a reason for hiding this comment

shruti-bh left a comment

Choose a reason for hiding this comment

shruti-bh Aug 21, 2020

Choose a reason for hiding this comment

sidgoyal78 Aug 22, 2020

Choose a reason for hiding this comment

codecov bot commented Aug 17, 2020 •

edited

Loading