Refactor: clean trainer device & distrib setters #5297

Borda · 2020-12-29T21:23:48Z

What does this PR do?

Fixes # (issue) <- this links related issue to this PR

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes [if needed]?
Did you write any new necessary tests [no need for typos, docs]?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Check that target branch and milestone are aligned!

Did you have fun?

Make sure you had fun coding 🙃

pep8speaks · 2020-12-29T23:41:11Z

Hello @Borda! Thanks for updating this PR.

In the file pytorch_lightning/accelerators/accelerator_connector.py:

Line 334:21: W503 line break before binary operator
Line 342:17: W503 line break before binary operator

Comment last updated at 2021-01-04 16:06:42 UTC

Borda · 2021-01-01T14:01:51Z

As a user input? No, I don't think so. When requesting ddp_cpu explicitly, we would need to know how many processes.

so it seems we had a bug there... :D
I have added if num_processes is None then sent nb CPU available

Also, be careful with the changes here, it needs to match the previous behavior exactly. The order of the if-elif blocks is important.

yes, that is why I change only setting and not reading in this PR and no tests shall be changed either...
the reading change is in #5300

Borda · 2021-01-01T18:13:16Z

@SeanNaren mind check why the parity params are different for ddp_cpu?

# Assert model parameters are identical after fit
        for ddp_param, custom_param in zip(ddp_model.parameters(), custom_plugin_model.parameters()):
>           assert torch.equal(ddp_param, custom_param), 'Model parameters are different between DDP and Custom plugin'
E           AssertionError: Model parameters are different between DDP and Custom plugin
E           assert False
E            +  where False = <built-in method equal of type object at 0x11d2d95b0>(Parameter containing:\ntensor([[ 0.1203,  0.0808, -0.0999,  0.1504, -0.1179,  0.0486, -0.1525,  0.1665,\n          0.207... -0.0007,\n         -0.0356, -0.2548,  0.0780, -0.1915, -0.1204, -0.1929,  0.1851, -0.1996]],\n       requires_grad=True), Parameter containing:\ntensor([[ 0.1277,  0.1138, -0.0707,  0.1564, -0.0783,  0.0421, -0.1193,  0.1352,\n          0.181...  0.0362,\n          0.0102, -0.1289,  0.1082, -0.1586, -0.0546, -0.1568,  0.1198, -0.1302]],\n       requires_grad=True))
E            +    where <built-in method equal of type object at 0x11d2d95b0> = torch.equal

codecov · 2021-01-02T16:07:49Z

Codecov Report

Merging #5297 (233515c) into release/1.2-dev (73e06fd) will decrease coverage by 2%.
The diff coverage is 95%.

@@               Coverage Diff                @@
##           release/1.2-dev   #5297    +/-   ##
================================================
- Coverage               93%     91%    -2%     
================================================
  Files                  144     146     +2     
  Lines                10146   10417   +271     
================================================
+ Hits                  9425    9516    +91     
- Misses                 721     901   +180

benchmarks/test_sharded_parity.py

pytorch_lightning/accelerators/accelerator_connector.py

SkafteNicki

LGTM

pytorch_lightning/plugins/plugin_connector.py

SeanNaren

So ddp_cpu can now only work if we set the number of processes above 1, and if specified automatically allocate all CPU resources. I think this is fine, just ensuring there is no case where we'd like to keep the current behaviour!

Borda · 2021-01-04T17:18:06Z

So ddp_cpu can now only work if we set the number of processes above 1, and if specified automatically allocate all CPU resources. I think this is fine, just ensuring there is no case where we'd like to keep the current behaviour!

happy to verify, but no idea when you do not want to use max resources...?

Borda added the refactor label Dec 29, 2020

Borda added this to the 1.2 milestone Dec 29, 2020

Borda self-assigned this Dec 29, 2020

Borda changed the title ~~Refactor: clean trainer device & distrib setters~~ [blocked by #5303] Refactor: clean trainer device & distrib setters Dec 30, 2020

Borda marked this pull request as ready for review December 30, 2020 19:01

Borda requested review from awaelchli, justusschock, SeanNaren, tchaton and williamFalcon as code owners December 30, 2020 19:01

Borda mentioned this pull request Dec 30, 2020

fix trainer distributed attributes #5303

Merged

12 tasks

Borda changed the title ~~[blocked by #5303] Refactor: clean trainer device & distrib setters~~ Refactor: clean trainer device & distrib setters Dec 31, 2020

Borda added 8 commits December 31, 2020 11:26

naive replace

3796972

simplify

0cbc4cc

clean

9b3c68d

.

71a81b1

fix

1c43e5e

.

64c73d5

fix

7af6832

fix

ff74854

Borda force-pushed the refactor/trainer-setters branch from dffc408 to ff74854 Compare December 31, 2020 10:27

Borda enabled auto-merge (squash) December 31, 2020 10:33

Borda requested review from carmocca, jeremyjordan, lezwon, s-rog, SkafteNicki and yukw777 December 31, 2020 10:33

Borda added 2 commits January 1, 2021 14:47

max

7780f3f

max

4a11e1a

Borda added 6 commits January 1, 2021 15:09

2

5d3a550

2

0beae57

tpu

0e99cb9

2

ed63435

flake8

8fc7711

.

fc26e76

Borda added 2 commits January 1, 2021 19:18

.

cbd0581

. @SeanNaren

2f6f608

chlog

233515c

Borda commented Jan 4, 2021

View reviewed changes

benchmarks/test_sharded_parity.py Show resolved Hide resolved

Borda commented Jan 4, 2021

View reviewed changes

benchmarks/test_sharded_parity.py Outdated Show resolved Hide resolved

benchmarks/test_sharded_parity.py Outdated Show resolved Hide resolved

benchmarks/test_sharded_parity.py Outdated Show resolved Hide resolved

pytorch_lightning/accelerators/accelerator_connector.py Show resolved Hide resolved

Apply suggestions from code review

713bc04

tchaton approved these changes Jan 4, 2021

View reviewed changes

Borda added ready PRs ready to be merged bug Something isn't working feature Is an improvement or enhancement labels Jan 4, 2021

SkafteNicki approved these changes Jan 4, 2021

View reviewed changes

pytorch_lightning/plugins/plugin_connector.py Outdated Show resolved Hide resolved

Borda commented Jan 4, 2021

View reviewed changes

pytorch_lightning/plugins/plugin_connector.py Outdated Show resolved Hide resolved

.

ad7fdee

SeanNaren approved these changes Jan 4, 2021

View reviewed changes

Borda merged commit b72ed71 into release/1.2-dev Jan 4, 2021

Borda deleted the refactor/trainer-setters branch January 4, 2021 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: clean trainer device & distrib setters #5297

Refactor: clean trainer device & distrib setters #5297

Borda commented Dec 29, 2020 •

edited

Loading

pep8speaks commented Dec 29, 2020 •

edited

Loading

Borda commented Jan 1, 2021

Borda commented Jan 1, 2021

codecov bot commented Jan 2, 2021 •

edited

Loading

SkafteNicki left a comment

SeanNaren left a comment

Borda commented Jan 4, 2021

Refactor: clean trainer device & distrib setters #5297

Refactor: clean trainer device & distrib setters #5297

Conversation

Borda commented Dec 29, 2020 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

pep8speaks commented Dec 29, 2020 • edited Loading

Comment last updated at 2021-01-04 16:06:42 UTC

Borda commented Jan 1, 2021

Borda commented Jan 1, 2021

codecov bot commented Jan 2, 2021 • edited Loading

Codecov Report

SkafteNicki left a comment

Choose a reason for hiding this comment

SeanNaren left a comment

Choose a reason for hiding this comment

Borda commented Jan 4, 2021

Borda commented Dec 29, 2020 •

edited

Loading

pep8speaks commented Dec 29, 2020 •

edited

Loading

codecov bot commented Jan 2, 2021 •

edited

Loading