[blocked by #1756] Add decorator to auto-move data for inference #1526

HenryJia · 2020-04-19T03:19:56Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Attempt to implement, fixes #1412
Currently only works for GPUs and not TPUs

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

pep8speaks · 2020-04-19T03:50:41Z

Hello @HenryJia! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-17 12:52:06 UTC

Borda

LGTM 🚀

tests/models/test_gpu.py

pytorch_lightning/core/lightning.py

tests/models/test_gpu.py

awaelchli

Hi, thanks for looking into this.
My main concern is that now the LightningModule is not anymore behaving like the nn.Module since you overwrite __call__. I think the idea is that LM can be used just like an nn.Module outside of PL. This PR kinda breaks that. At the minimum, I would show a warning that this is done automatically because the user may be wondering why data transfers are slow (cpu to gpu for example).
Also, the assertion that the parameters must be on the same device is not correct.

pytorch_lightning/core/lightning.py

HenryJia · 2020-04-19T14:54:50Z

My main concern is that now the LightningModule is not anymore behaving like the nn.Module since you overwrite __call__. I think the idea is that LM can be used just like an nn.Module outside of PL. This PR kinda breaks that. At the minimum, I would show a warning that this is done automatically because the user may be wondering why data transfers are slow (cpu to gpu for example).

I'm not completely overwriting it, I am using super to call nn.Module's __call__ which does almost all of the work. That being said, what you're saying does make sense, there should be some sort of warning for it

Also, the assertion that the parameters must be on the same device is not correct.

On second thought yes that makes sense, I'll get rid of that. I'll have to rethink how I detect the device that the model is on then

HenryJia · 2020-04-20T03:12:46Z

@awaelchli I've now moved the data transfer code to pytorch_lighting.utilities to remove the issue of code duplication, and added warnings

Borda

also missing changelog

pytorch_lightning/core/lightning.py

HenryJia · 2020-04-20T13:52:35Z

@Borda done!

mergify · 2020-04-20T21:38:03Z

This pull request is now in conflict... :(

codecov · 2020-04-21T14:07:08Z

Codecov Report

Merging #1526 into master will increase coverage by 1%.
The diff coverage is 85%.

@@           Coverage Diff           @@
##           master   #1526    +/-   ##
=======================================
+ Coverage      88%     89%    +1%     
=======================================
  Files          69      70     +1     
  Lines        4316    3833   -483     
=======================================
- Hits         3805    3415   -390     
+ Misses        511     418    -93

mergify · 2020-04-22T00:34:47Z

This pull request is now in conflict... :(

HenryJia · 2020-04-22T17:13:48Z

@Borda I think I've dealt with all the issues you pointed out, would you mind reviewing again?

awaelchli · 2020-04-22T20:06:16Z

pytorch_lightning/utilities/data.py

+
+    if callable(getattr(batch, 'to', None)):
+        if warn_on_transfer:
+            rank_zero_warn('Auto transferred data to device {}'.format(device))


I would move the warning out to the __call__ , because 1. this utility function is more general (it is used in other parts) and 2. this function is recursive, so if a dict of tensors is passed in, the warning would be shown multiple times.

@awaelchli I thought about this, a problem with that is I effectively run into the same code duplication problem by trying to detect which device it's on in __call__ since I'd need to recurse into whatever format the data is in again in almost the exact same way
Also, I believe rank_zero_warn will only warn once anyway so that is not an issue

justusschock · 2020-04-24T07:23:24Z

pytorch_lightning/core/lightning.py

+                    output = model(x)  # Lightning will automove data here and warn you of it
+
+        """
+        devices = [p.device for p in self.parameters()]


I'm not sure, if looping always over all the params is a good idea. Can we maybe cache the devices somehow?

@mcarilli are we missing a simpler/cleaner way of doing this?

x = x.cpu() model.cuda() # this works in lightning out = model(x) # out is cuda tensor now

Maybe I don't understand what you're trying to do here, but it looks like you're only using device[0], so why collect them all? Also if the model params do reside on multiple devices, it's hard to predict which device the user actually wants the input data to reside on.

So, I only apply automatic data transfer if we are dealing with the simple case of the model residing on one device, as trying to auto transfer data when the model is spread across multiple devices is very non-trivial and is heavily dependent on model structure

justusschock · 2020-04-24T07:24:06Z

pytorch_lightning/core/lightning.py

+            device = devices[0]
+            data = transfer_data_to_device(data, device.type, device.index, warn_on_transfer=True)
+            kwargs = transfer_data_to_device(kwargs, device.type, device.index, warn_on_transfer=True)
+        return super(LightningModule, self).__call__(*data, **kwargs)


@Borda and I discussed this and we both agree, that we shouldn't do this in the Module (at least not by default). In our opinion we should always be able to use lightning module as nn module.

What I propose is the following:

We change this part to a decorator, that can be added to forward and is automatically added from trainer. (sorry for that coming so late).

In that case you could make the decorator a class, that also caches devices eventually :)

Love the idea of adding the decorator dynamically there!

mergify · 2020-04-24T21:22:11Z

This pull request is now in conflict... :(

williamFalcon · 2020-04-27T11:54:41Z

@HenryJia maybe this should be a broader effort to do distributed inference?

But we probably need a LightningModule method?

Here's a brainstorm on how we might be able to solve distributed inference?

model = LightningModule.load_from_checkpoint(...)
model.init_distributed(backend='', gpus=2, etc...)

model(x)

HenryJia · 2020-04-27T16:20:15Z

Makes sense, I'll look at this again with fresh ideas at a later point, I'm a bit busy as of late with other things right now

mergify · 2020-05-12T11:54:16Z

This pull request is now in conflict... :(

williamFalcon

@HenryJia this is great! Let's go with the suggestion by @justusschock and @Borda to build a decorator instead!

HenryJia · 2020-05-18T01:46:24Z

@HenryJia this is great! Let's go with the suggestion by @justusschock and @Borda to build a decorator instead!

Sounds good, I'll get back on this in a couple of weeks time when all my university exams are over

awaelchli · 2020-05-18T01:50:43Z

Let us know if some of us should pick up your work and continue :)

Borda · 2020-05-18T06:48:41Z

I am afraid that letting it sleep and finish it in couple weeks would be a bit difficult regarding continues development...
@awaelchli mind take it over?

awaelchli · 2020-05-20T00:02:02Z

@Borda sure I could look at it after #1756 is merged which brings some simplifications that could be useful here too.

williamFalcon · 2020-05-25T11:52:55Z

Where are the docs on using this?
in the docs we need an example
i don't see the decorator
we shouldn't warn the user... this will get annoying very quickly

awaelchli · 2020-05-25T12:07:55Z

@williamFalcon This PR is continued over here #1905
I was specifically asked to take over the work

HenryJia · 2020-05-27T14:07:03Z

Sorry for the late reply.
Yeah feel free to take over, I'll have a lot more time after Saturday and can get back on this if you want. Since there's a new PR should I close this one then?

awaelchli · 2020-05-27T14:49:46Z

We can close it or you can merge the changes in my branch into yours if you'd like to make adjustments. Either one is fine with me. Note that my PR is rebased onto #1756, not master, but it also contains your commits.
I really hope #1756 gets merged soon.

I think one thing we need to figure out is how we will best apply the decorator.
Should there be a toggle on the LightningModule to turn it on and off? Or should only Trainer add the decorator or user manually?

Borda · 2020-05-27T15:09:07Z

Btw, for changing destination branch, there is no need to close PR, we can just change the destination branch =)

HenryJia · 2020-06-01T19:24:06Z

I read through your branch. It all looks good to me and better than where I left this. So I'll close this PR for now (unless anyone has any objections)

mergify bot requested a review from a team April 19, 2020 03:20

HenryJia changed the title ~~[WIP] Auto move data for inference~~ Auto move data for inference Apr 19, 2020

Borda approved these changes Apr 19, 2020

View reviewed changes

tests/models/test_gpu.py Outdated Show resolved Hide resolved

pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved

pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved

tests/models/test_gpu.py Outdated Show resolved Hide resolved

mergify bot requested a review from a team April 19, 2020 09:38

awaelchli reviewed Apr 19, 2020

View reviewed changes

pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved

Borda added the feature Is an improvement or enhancement label Apr 20, 2020

Borda requested changes Apr 20, 2020

View reviewed changes

pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved

pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved

pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved

mergify bot requested review from a team April 20, 2020 05:46

Borda added this to the 0.7.4 milestone Apr 20, 2020

awaelchli reviewed Apr 22, 2020

View reviewed changes

justusschock reviewed Apr 24, 2020

View reviewed changes

mergify bot requested review from a team April 24, 2020 07:27

Borda modified the milestones: 0.7.4, 0.7.5 Apr 24, 2020

Borda requested review from williamFalcon, awaelchli and justusschock April 29, 2020 11:53

Borda modified the milestones: 0.7.6, 0.8.0, 0.7.7 May 12, 2020

awaelchli mentioned this pull request May 15, 2020

data transfer model hook (+ refactor) #1756

Merged

5 tasks

Update test_cpu.py

e983022

williamFalcon requested changes May 17, 2020

View reviewed changes

mergify bot requested a review from a team May 17, 2020 12:53

williamFalcon changed the title ~~Auto move data for inference~~ [WIP] Add decorator to auto-move data for inference May 17, 2020

Borda assigned awaelchli May 18, 2020

awaelchli mentioned this pull request May 20, 2020

HenryJia: auto-move data decorator #1905

Merged

Borda changed the title ~~[WIP] Add decorator to auto-move data for inference~~ [blocked by #1412] Add decorator to auto-move data for inference May 22, 2020

Borda changed the title ~~[blocked by #1412] Add decorator to auto-move data for inference~~ [blocked by #1756] Add decorator to auto-move data for inference May 22, 2020

awaelchli marked this pull request as draft May 25, 2020 11:55

Borda modified the milestones: 0.7.7, 0.8.0 May 26, 2020

HenryJia closed this Jun 1, 2020

Borda added the duplicate This issue or pull request already exists label Jun 1, 2020

moi90 mentioned this pull request Jun 9, 2020

Fractional epoch #2129

Closed

[blocked by #1756] Add decorator to auto-move data for inference #1526

[blocked by #1756] Add decorator to auto-move data for inference #1526

Conversation

HenryJia commented Apr 19, 2020 • edited by Borda Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

pep8speaks commented Apr 19, 2020 • edited Loading

Comment last updated at 2020-05-17 12:52:06 UTC

Borda left a comment

Choose a reason for hiding this comment

awaelchli left a comment • edited Loading

Choose a reason for hiding this comment

HenryJia commented Apr 19, 2020 • edited Loading

HenryJia commented Apr 20, 2020 • edited Loading

Borda left a comment

Choose a reason for hiding this comment

HenryJia commented Apr 20, 2020

mergify bot commented Apr 20, 2020

codecov bot commented Apr 21, 2020 • edited Loading

Codecov Report

mergify bot commented Apr 22, 2020

HenryJia commented Apr 22, 2020

awaelchli Apr 22, 2020 • edited Loading

Choose a reason for hiding this comment

HenryJia Apr 22, 2020 • edited Loading

Choose a reason for hiding this comment

justusschock Apr 24, 2020

Choose a reason for hiding this comment

williamFalcon Apr 24, 2020

Choose a reason for hiding this comment

mcarilli Apr 24, 2020 • edited Loading

Choose a reason for hiding this comment

williamFalcon Apr 27, 2020

Choose a reason for hiding this comment

HenryJia Apr 28, 2020

Choose a reason for hiding this comment

justusschock Apr 24, 2020

Choose a reason for hiding this comment

awaelchli Apr 25, 2020

Choose a reason for hiding this comment

mergify bot commented Apr 24, 2020

williamFalcon commented Apr 27, 2020

HenryJia commented Apr 27, 2020

mergify bot commented May 12, 2020

williamFalcon left a comment

Choose a reason for hiding this comment

HenryJia commented May 18, 2020

awaelchli commented May 18, 2020

Borda commented May 18, 2020

awaelchli commented May 20, 2020

williamFalcon commented May 25, 2020

awaelchli commented May 25, 2020 • edited Loading

HenryJia commented May 27, 2020

awaelchli commented May 27, 2020 • edited Loading

Borda commented May 27, 2020

HenryJia commented Jun 1, 2020

HenryJia commented Apr 19, 2020 •

edited by Borda

Loading

pep8speaks commented Apr 19, 2020 •

edited

Loading

awaelchli left a comment •

edited

Loading

HenryJia commented Apr 19, 2020 •

edited

Loading

HenryJia commented Apr 20, 2020 •

edited

Loading

codecov bot commented Apr 21, 2020 •

edited

Loading

awaelchli Apr 22, 2020 •

edited

Loading

HenryJia Apr 22, 2020 •

edited

Loading

mcarilli Apr 24, 2020 •

edited

Loading

awaelchli commented May 25, 2020 •

edited

Loading

awaelchli commented May 27, 2020 •

edited

Loading