-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix(Early Stopping): move best score to device #7959
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7959 +/- ##
=======================================
- Coverage 92% 87% -5%
=======================================
Files 207 207
Lines 13484 13484
=======================================
- Hits 12354 11713 -641
- Misses 1130 1771 +641 |
953d0a9
to
cacaa68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - at least, it does what it says it does haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
@@ -277,6 +277,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). | |||
- Pass the `stage` argument of `Callback.{setup,teardown}` as a keyword ([#7973](https://github.com/PyTorchLightning/pytorch-lightning/pull/7973)) | |||
|
|||
|
|||
- Fixed move best score to device in EarlyStopping Callback ([#7959](https://github.com/PyTorchLightning/pytorch-lightning/pull/7959)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Fixed move best score to device in EarlyStopping Callback ([#7959](https://github.com/PyTorchLightning/pytorch-lightning/pull/7959)) | |
- Fixed move best score to device in EarlyStopping Callback ([#7959](https://github.com/PyTorchLightning/pytorch-lightning/pull/7959)) |
as your comments suggest, this fix applies only for tpu right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It applies to other accelerators as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would I reproduce an error on a GPU for example?
why can't we do for example this:
self.best_score.to(current.device)
moving one tensor to the device but the other not in a monitor_op is only going to raise questions for the reader of this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> torch_inf = torch.tensor(np.Inf)
>>> value = torch.tensor(5, device=xm.xla_device())
>>> torch.lt(value, torch_inf)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: torch_xla/csrc/aten_xla_bridge.cpp:69 : Check failed: xtensor
*** Begin stack trace ***
tensorflow::CurrentStackTrace()
Similar code for Cuda devices won't throw an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the following error (device with 2 gpus and using ddp)
RuntimeError:
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!`
I also felt that self.best_score.to(current.device)
was safer..
What does this PR do?
Fixes #7936
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃