Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weekly Patch Release v1.4.6 [full merge, no squash] #9358

Merged
merged 17 commits into from
Sep 10, 2021
Merged

Conversation

justusschock
Copy link
Member

@justusschock justusschock commented Sep 7, 2021

What does this PR do?

gh pr list -s merged -S 'merged:2021-09-01T16:30:00.000Z..2021-09-07T22:30:00.000Z' --json mergedAt,milestone,url,mergeCommit,title --jq 'sort_by(.mergedAt) | reverse | .[] | select((.milestone.title == "v1.4.x") or (.milestone.title == null)) | [.url, .mergeCommit.oid, .title] | join(" ")' --limit 100
https://github.com/PyTorchLightning/pytorch-lightning/pull/9288 6892d533ea1c743f7e05171846a28e685db85f51 Run plugin closure before `on_before_optimizer_step` [1/2]
https://github.com/PyTorchLightning/pytorch-lightning/pull/9311 d49709e29c6174be8bde7e1edc288180a2173adc Fix DeepSpeed warning CI Test
https://github.com/PyTorchLightning/pytorch-lightning/pull/9319 0135a4bd1ca338ddd1ceedd17c9a91dcd8d8be1f Remove some incorrect comments in ddp.py
https://github.com/PyTorchLightning/pytorch-lightning/pull/9336 98e2f56db090f38a09d5f63202688590702ba15a Clear reference to training loss at the end of train step
https://github.com/PyTorchLightning/pytorch-lightning/pull/9316 9149b649089976d2d723e33fca929bf92f192ff8 [bugfix] Resolve PyTorch Profiling for Manual Optimization
https://github.com/PyTorchLightning/pytorch-lightning/pull/9125 904dde7573c97245b45e477631ece989bd8c01e9 Fix inspection of unspecified args for container hparams
https://github.com/PyTorchLightning/pytorch-lightning/pull/9279 dc3391beaec4e16b08ffe8bf9a05cf8039f8b9e7 Remove deprecation warnings being called for `on_{task}_dataloader`
https://github.com/PyTorchLightning/pytorch-lightning/pull/8877 cf1a589956f86a0cf1a50c0710051eee9b082094 Allow disabling automatic stopping after max_steps or max_epochs
https://github.com/PyTorchLightning/pytorch-lightning/pull/9308 f6d40871bd52ac755a146958513a0a330b813b52 Prevent loss to be moved to the cpu before backward call.
https://github.com/PyTorchLightning/pytorch-lightning/pull/9301 9d0caa6928c28fcf2252c3acdc6fda8570e5adb9 Fix TPU cleaning job
https://github.com/PyTorchLightning/pytorch-lightning/pull/9156 d5ee8d8e3f46f0e5a6789f45d865fb348fd738f3 Disable `{save,check}_on_train_epoch_end` with `check_val_every_n_epoch>1`
https://github.com/PyTorchLightning/pytorch-lightning/pull/9261 f745aa9ce1b8a78b8ef27b939dc1db456837b374 Move tracking epoch end outputs logic to the `EvaluationEpochLoop`
https://github.com/PyTorchLightning/pytorch-lightning/pull/9223 a7461bfc3b98da2314c21603ee457c4b604f4c9a Add missing callbacks to `callbacks.rst`
https://github.com/PyTorchLightning/pytorch-lightning/pull/9232 ead2404aac20658b6ca0d99317bbaabc94f99f87 Added doc strings to base logger file
https://github.com/PyTorchLightning/pytorch-lightning/pull/9231 f0788b3bbc8773543297ee8d8c6d17a679703bb1 scheduled removal of auto_move_data decorator
https://github.com/PyTorchLightning/pytorch-lightning/pull/9267 69cdb79e33de3dc0b19aad4c6fe8c5c9d21d28c4 Add check for uninitialized `_sync_dir` in DDP Plugin to avoid errors during error handling
https://github.com/PyTorchLightning/pytorch-lightning/pull/9289 071ae498083afc131828c982b3fcb62944a751d1 Fix `LightningOptimizer.step` signature
https://github.com/PyTorchLightning/pytorch-lightning/pull/8800 e2ecb8f8591d79e81512cd70d773cb9b4c390132 Allow exporting to onnx when input is tuple
https://github.com/PyTorchLightning/pytorch-lightning/pull/9255 f9994d456cb264f8d66002eae6d7d51bd1ecc94d Update CHANGELOG following patch releases

Excluded #9231 due to breaking changes and #8877 since no bugfix

Also skipping #9308 since it's entangled with data fetching.
And #9319 because it requires Post LOCALSGD

Fixes #<issue_number>

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@justusschock justusschock changed the base branch from master to release/1.4.x September 7, 2021 12:26
@justusschock justusschock marked this pull request as ready for review September 7, 2021 12:28
@justusschock justusschock changed the title V1.4.6 Weekly Patch Release v1.4.6 [full merge, no squash] Sep 7, 2021
@codecov
Copy link

codecov bot commented Sep 7, 2021

Codecov Report

Merging #9358 (520d85d) into release/1.4.x (a61cc72) will increase coverage by 0%.
The diff coverage is 95%.

@@              Coverage Diff              @@
##           release/1.4.x   #9358   +/-   ##
=============================================
  Coverage             92%     92%           
=============================================
  Files                218     218           
  Lines              14490   14511   +21     
=============================================
+ Hits               13393   13419   +26     
+ Misses              1097    1092    -5     

@Borda Borda added Important let's do it! approved to implement labels Sep 7, 2021
@Borda Borda marked this pull request as draft September 7, 2021 15:04
@justusschock justusschock marked this pull request as ready for review September 7, 2021 16:22
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

justusschock and others added 5 commits September 7, 2021 18:38
* Update parsing.py

* add todo (for single arg)

* unblock non container single arg

* init test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update CHANGELOG.md

* pep8 line length

* Update pytorch_lightning/utilities/parsing.py

* remove dict namespace conversion

* add omegaconf support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add dict test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add omegaconf test

* Update CHANGELOG.md

* Update pytorch_lightning/utilities/parsing.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/utilities/parsing.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Without clearing this reference, the loss tensor stays live through the next training
step. This can be a problem for memory intensive models that produce very deep backward
graphs such as neural ODEs. For these models, keeping the backward graph of the previous
loss in memory can lead to OOM errors in the next training step even though the step might
have succeeded if we had cleared (and thus GC'd) the previous backward graph.

Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
@mergify mergify bot added the ready PRs ready to be merged label Sep 7, 2021
@carmocca
Copy link
Contributor

carmocca commented Sep 7, 2021

I think #9288 and #9308 should make it in.

Also, remember you should update the milestones of the PRs with no milestone or that you've decided not to include

@justusschock
Copy link
Member Author

i can add #9308 manually. when I tried to cherry pick it, it somehow messed everything up since it included the data fetching. Not sure why...

@awaelchli awaelchli added this to the v1.4.x milestone Sep 7, 2021
@leezu
Copy link
Contributor

leezu commented Sep 7, 2021

Would it make sense to revert #9239 as part of 1.4.6? This can trigger "RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment". I'll open an issue about it soon

@carmocca
Copy link
Contributor

carmocca commented Sep 7, 2021

I'll open an issue about it soon

You can use #8821 - check the last few comments. A reproduction would be appreciated.

@justusschock justusschock force-pushed the v1.4.6 branch 2 times, most recently from 48483ef to fe4d3dd Compare September 8, 2021 20:58
@awaelchli awaelchli force-pushed the v1.4.6 branch 3 times, most recently from fbb7f16 to 269cb03 Compare September 9, 2021 21:24
Co-authored-by: Justus Schock <justus.schock@lfb.rwth-aachen.de>
@awaelchli
Copy link
Contributor

test_deepspeed_multigpu_stage_3 test passes but pytest hangs. 🙈 😭

Somehow deepspeed stage 3 changes made in into this PR even though none of the listed commits have any deepspeed fixes.

@lexierule lexierule merged commit 00c6640 into release/1.4.x Sep 10, 2021
@lexierule lexierule deleted the v1.4.6 branch September 10, 2021 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
let's do it! approved to implement ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants