Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add progress tracking on Loops - 2/n #8362

Merged
merged 112 commits into from
Jul 19, 2021
Merged

Conversation

tchaton
Copy link
Contributor

@tchaton tchaton commented Jul 10, 2021

What does this PR do?

This PR build on top of #8334. Merge in order.

This PR adds progress tracking to Fit / Evaluation / Predict loops and ensures the correct progress state.

Fixes #6429

Does your PR introduce any breaking changes? If yes, please list them.

No

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)
  • Did you list all the breaking changes introduced by this pull request?

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@tchaton tchaton added this to the v1.4 milestone Jul 10, 2021
@tchaton tchaton self-assigned this Jul 10, 2021
@tchaton tchaton changed the base branch from master to loop_improvement July 10, 2021 12:16
@codecov
Copy link

codecov bot commented Jul 10, 2021

Codecov Report

Merging #8362 (b510a96) into master (176df20) will decrease coverage by 0%.
The diff coverage is 94%.

@@          Coverage Diff           @@
##           master   #8362   +/-   ##
======================================
- Coverage      93%     92%   -0%     
======================================
  Files         216     216           
  Lines       14109   14093   -16     
======================================
- Hits        13083   13000   -83     
- Misses       1026    1093   +67     

@carmocca carmocca force-pushed the add_progress_tracking_on_loops branch from 994fd37 to e550e6d Compare July 15, 2021 17:47
@mergify mergify bot removed the has conflicts label Jul 15, 2021
tests/loops/test_loops.py Outdated Show resolved Hide resolved
Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything is resolved now 🥵

We can think about the fault-tolerant checkpoint design in a later PR.

Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

big one!

CHANGELOG.md Outdated Show resolved Hide resolved
@awaelchli awaelchli added the ready PRs ready to be merged label Jul 16, 2021
@tchaton tchaton enabled auto-merge (squash) July 16, 2021 17:49
Copy link
Member

@ethanwharris ethanwharris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 😃

@tchaton tchaton merged commit 7bb810f into master Jul 19, 2021
@tchaton tchaton deleted the add_progress_tracking_on_loops branch July 19, 2021 08:31
@@ -1259,3 +1261,10 @@ def _log_device_info(self) -> None:
"IPU available but not used. Set the `ipus` flag in your trainer"
" `Trainer(ipus=8)` or script `--ipus=8`."
)

def _on_expection(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question about this function - is there a reason it is a function instead of a few lines in the exception handling? This function is only called once

This function seems a bit like a callback, but that's not what it is right? Still learning and understanding

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just meant as a stub for further improvements:

We can think about the fault-tolerant checkpoint design in a later PR.

It's entirely fine to change it - although it wouldn't be a callback as it only needs to run on one hook: on_exception

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Formalize progress tracking inside of the trainer internals
8 participants