-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add progress tracking on Loops - 2/n #8362
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8362 +/- ##
======================================
- Coverage 93% 92% -0%
======================================
Files 216 216
Lines 14109 14093 -16
======================================
- Hits 13083 13000 -83
- Misses 1026 1093 +67 |
994fd37
to
e550e6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think everything is resolved now 🥵
We can think about the fault-tolerant checkpoint design in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
big one!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 😃
@@ -1259,3 +1261,10 @@ def _log_device_info(self) -> None: | |||
"IPU available but not used. Set the `ipus` flag in your trainer" | |||
" `Trainer(ipus=8)` or script `--ipus=8`." | |||
) | |||
|
|||
def _on_expection(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about this function - is there a reason it is a function instead of a few lines in the exception handling? This function is only called once
This function seems a bit like a callback, but that's not what it is right? Still learning and understanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just meant as a stub for further improvements:
We can think about the fault-tolerant checkpoint design in a later PR.
It's entirely fine to change it - although it wouldn't be a callback as it only needs to run on one hook: on_exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thank you!!
What does this PR do?
This PR build on top of #8334. Merge in order.
This PR adds progress tracking to Fit / Evaluation / Predict loops and ensures the correct progress state.
Fixes #6429
Does your PR introduce any breaking changes? If yes, please list them.
No
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃