Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore log step during restart #13467

Merged
merged 4 commits into from
Jul 12, 2022
Merged

Restore log step during restart #13467

merged 4 commits into from
Jul 12, 2022

Conversation

rohitgr7
Copy link
Contributor

@rohitgr7 rohitgr7 commented Jun 30, 2022

What does this PR do?

Fixes #12274

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

cc @Borda @carmocca @edward-io @ananthsub @rohitgr7 @kamil-kaczmarek @Raalsky @Blaizzy @justusschock @ninginthecloud

@rohitgr7 rohitgr7 added bug Something isn't working logging Related to the `LoggerConnector` and `log()` loops Related to the Loop API labels Jun 30, 2022
@rohitgr7 rohitgr7 self-assigned this Jun 30, 2022
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
@mergify mergify bot added the ready PRs ready to be merged label Jun 30, 2022
@rohitgr7 rohitgr7 enabled auto-merge (squash) June 30, 2022 17:43
@mergify mergify bot removed the ready PRs ready to be merged label Jul 1, 2022
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
@rohitgr7 rohitgr7 requested a review from awaelchli July 1, 2022 07:30
@mergify mergify bot added the ready PRs ready to be merged label Jul 1, 2022
@rohitgr7 rohitgr7 added this to the pl:1.6.x milestone Jul 12, 2022
@rohitgr7 rohitgr7 merged commit df931e2 into master Jul 12, 2022
@rohitgr7 rohitgr7 deleted the fix/log_step branch July 12, 2022 09:46
rohitgr7 added a commit that referenced this pull request Jul 12, 2022
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
carmocca added a commit that referenced this pull request Jul 12, 2022
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
lexierule pushed a commit that referenced this pull request Jul 12, 2022
* update NGC docker (#13136)

* update docker
* Apply suggestions from code review

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Decouple pulling legacy checkpoints from existing GHA workflows and docker files (#13185)

* Add pull-legacy-checkpoints action
* Replace pulls with the new action and script
* Simplify

* Merge pull request #13250 from PyTorchLightning/ci/rm-base

CI: Remove simple test `ci_test-base.yml`

* Update rich requirement from !=10.15.*,<=12.0.0,>=10.2.2 to >=10.2.2,!=10.15.0.a,<13.0.0 in /requirements (#13047)

* Update rich requirement in /requirements

Updates the requirements on [rich](https://github.com/willmcgugan/rich) to permit the latest version.
- [Release notes](https://github.com/willmcgugan/rich/releases)
- [Changelog](https://github.com/Textualize/rich/blob/master/CHANGELOG.md)
- [Commits](Textualize/rich@v10.2.2...v12.4.1)

---
updated-dependencies:
- dependency-name: rich
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix torch.distributed._sharded_tensor DeprecationWarning (#13261)

* update tutorials (#13268)

* [BUG] `estimated_stepping_batches` requires distributed comms in `configure_optimizers` for `DeepSpeedStrategy` (#13350)

* Update torchmetrics requirement from <=0.7.2,>=0.4.1 to >=0.4.1,<0.9.2 in /requirements (#13275)

Update torchmetrics requirement in /requirements

Updates the requirements on [torchmetrics](https://github.com/PyTorchLightning/metrics) to permit the latest version.
- [Release notes](https://github.com/PyTorchLightning/metrics/releases)
- [Changelog](https://github.com/PyTorchLightning/metrics/blob/master/CHANGELOG.md)
- [Commits](Lightning-AI/torchmetrics@v0.4.1...v0.9.1)

---
updated-dependencies:
- dependency-name: torchmetrics
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix mypy errors for model summary utilities (#13384)

* rename org Lightning AI

* Modified python version check to accommodate for legacy version styles (#13420)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

(cherry picked from commit b332b66)

* Call `set_epoch` for distributed batch samplers (#13396)

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

(cherry picked from commit 2dd332f)

* _RICH_AVAILABLE

* _FAIRSCALE_AVAILABLE

* _BAGUA_AVAILABLE

* redefine

* chlog spaces

* CI: Fix `fatal: unsafe repository` (#13515)

* update release date

* CI: azure rename

* Restore log step during restart (#13467)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove redundant test

* Update CI setup (#13291)

* drop mamba
* use legacy GPU machines

* fix schema check

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Martino Sorbaro <martinosorb@users.noreply.github.com>
jerome-habana pushed a commit to jerome-habana/lightning that referenced this pull request Jul 14, 2022
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working logging Related to the `LoggerConnector` and `log()` loops Related to the Loop API ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resuming training resets the logged step number
4 participants