Skip to content
This repository has been archived by the owner on Sep 28, 2022. It is now read-only.

Update local pytorch-lightning master #1

Merged
merged 2,310 commits into from
Mar 30, 2021

Conversation

shnela
Copy link

@shnela shnela commented Mar 30, 2021

No description provided.

awaelchli and others added 30 commits February 18, 2021 00:28
* add padding

* fix

* fix

* Update pytorch_lightning/callbacks/progress.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* updated based on suggestion

* changelog

* add test

* fix pep8

* resolve test

* fix code format

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
* update changelog

* apply untoggle_optimizer when result is None

* update tests

* still return loss sometimes

* Update CHANGELOG.md

Co-authored-by: deng-cy <dcy1996@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Add initial deepspeed changes

* Address code review

* Move static method outside of function

* Fixes

* Add missing annotation

* Remove seed setting

* Doc changes

* Doc changes, add address reviews

* Fix docs

* Try fixing issue by moving to torch adam

* Clean up check

* Changes, better APIs!

* Add wrapper, swap to git install revision

* Add special test

* Add warning

* Address review

* Add better disclaimer

* Turn off ZeRO for testing due to compilation

* Add description on modifying parameters via the plugin

* Doc strings clear

* Small doc fixes

* Fix hash, reduce test

* Added CI change

* Move to azure pipeline

* Fix test name

* Add missing flag

* Remove sudo...

* Try conda instead

* Swap to conda base

* Try suggested install

* Apply suggestions from code review

* Apply suggestions from code review

* Revert "Apply suggestions from code review"

This reverts commit 41cca05

* Revert "Apply suggestions from code review"

This reverts commit e06ec29

* Remove setter

* Address most review

* Move out function, remove DeepSpeed from requirements

* Install deepspeed/mpi4py within container

* Use special tests, move to master commit for deepspeed

* Export path

* Force compile to happen first

* Remove!

* Debugging ninja

* Fix error in optimizer step logic

* Attempt to fix symbolic link

* Reverse to aid debugging

* Export path again

* Clean up mess

* var

* Revert "var"

This reverts commit 3450eac

* Address review, add todo

* Add note about unsupported functionality

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
* Trainer only references accelerator where it can

* Move teardown to the trainer, as it is reponsible for the accelerator
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
…ure (#6043)

* Move to CUDA image

* Remove deepspeed install as deepspeed now in the cuda image

* Remove path setting, as ninja should be in the container now
* added on_post_move_to_device

* added tests

* docs and refactors

* Update tests/backends/test_tpu_backend.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update docs/source/tpu.rst

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update docs/source/tpu.rst

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update docs/source/tpu.rst

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/hooks.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* moved weight sharing module back to test

updated tpu available

* add count to warning

* fix doctest

* import trainer in doctest

* import trainer in doctest

* do not test code as no TPU device

* param count to layer count

* formatting

* update docs

* update import

* update

* resolve tests

* remove legacy accelerator

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Your Name <you@example.com>
* Remove TrialMNISTDataModule

* Allow using TrialMNIST in the MNISTDataModule

* Update tests/helpers/datasets.py
* Fix: Allow hashing of metrics with lists in their state

* Add test case and modify semantics of Metric __hash__ in order to be compatible with structural equality checks

* Fix pep8 style issue

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* et al.

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: chaton <thomas@grid.ai>
* fix/test quant

* ...

* ---
#6044)

* Add descriptions to accelerator broadcast function/clean up all_gather

* Remove todo
* add hooks

* comment

* docs

* add tests

* make it private

* fix tests

* docs

* chlog

* testcode

* codefactor

* fix doctest

* fix doctest

* suggestions

* is always overriden

* pep and BoringModel

* BoringModel

* docs

* docs

* docs

* fix

* rebase

* rebase

* suggestions

* docs

* suggestions

* try fix docs

* docs

* update name

* yapf

* docs

* rebase

* yapf
* Make parallel devices optional across all plugins so that they can be instantiated

* Add any to types to capture vars passed in
Put .test() in  code blocks
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* rename get_model -> lightning_module

* update references to get_model

* pep8

* add proper deprecation

* remove outdated _get_reference_model

* fix cyclic import
* rename accelerator backend

* rename new additions from master

* add proper deprecation

* pep8

* warning match

* add missing warning type
* flake8

* fix cyclic import

* isort
* Add warnings to hooks

* Add default idx to prevent signature change in the future

* Nothing to see here

* Add default val to transfer_batch_to_device hook

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Revert "Add default val to transfer_batch_to_device hook"

This reverts commit 5c6a68f

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* v1.2.0rc2

* chlogs

* chlogs

* format

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* fix docs

* update on comments

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* rm comment

* Update docs/source/common/lightning_module.rst

Co-authored-by: chaton <thomas@grid.ai>

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
ananthsub and others added 25 commits March 24, 2021 13:54
* Metrics holder cleanup and better error message

* Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py

* _VALUE -> _METRIC_TYPE
* Fix checkpoint callback issue for TPUs

* update changelog

* add barrier

* apply code suggestions

* update trainer test

* remove spaces

* fix tpu tests

* Apply suggestions from code review

* add comment

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
… of train/val/test (#6498)

* update docs

* add hook and update docs

* update tests

* chlog

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* chlog

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* use external deprecate

* simplify

* simplify

* simplify

* flake8

* .

* others

* .
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Add artifcact_location arg to MLFlow logger

* Add CHANGELOG URL

* Update test
* add warning non reduced

* add test

* update test

* update changelog

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* update

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* use latest

* remake

* examples
…6667)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* support python 3.9

* update CI

* onnxruntime

* .

* .

* onnxruntime

* t 55

* t 75

* add script

* use

* onnx

* onnx

* onnx

* whl

* np

* find

* 21

* Apply suggestions from code review

* Apply suggestions from code review

* onnx

* CI

* req

* ~ dockers

* min

* .

* drop horovod

* drop horovod

* drop horovod

* fix

* fix

* .
…6719)

* update_logic

* update

* Update tests/utilities/test_xla_device_utils.py

* Update pytorch_lightning/utilities/xla_device.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* Update pytorch_lightning/utilities/xla_device.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* update test

* Update tests/utilities/test_xla_device_utils.py

* update

* Apply fix

* Docstring

* flake8

* update

Co-authored-by: Your Name <you@example.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
* move save_checkpoint responsability to accelerator

* update
* Add base hook for model parallel

* fix callback signature

* Simplify hook

* Add hook logic

* add tests

* add property setter

* add logic for being called once

* Update changelog

* Fix

* fix return type

* fix lambda callback test

* Fix tests

* Apply code suggestions

* add logic for setup_optimizers_predispatch

* add common dummy model

* Swap call order

* Remove test that isn't needed anymore

* Update tests

* Add a bit more doc

* Few code review fixes

* Update pytorch_lightning/accelerators/accelerator.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Change hook name

* Fix test

* Test setup hook, refactor names

* Swap call order of callbacks and model initialization

* Change name of context manager

Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* update chlog v1.2.5

* legacy
@codecov-io
Copy link

codecov-io commented Mar 30, 2021

Codecov Report

❗ No coverage uploaded for pull request base (master@b9364f9). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master       #1   +/-   ##
=========================================
  Coverage          ?   86.76%           
=========================================
  Files             ?      192           
  Lines             ?    12174           
  Branches          ?        0           
=========================================
  Hits              ?    10563           
  Misses            ?     1611           
  Partials          ?        0           
Flag Coverage Δ
cpu 86.76% <0.00%> (?)
pytest 86.76% <0.00%> (?)
python3.7 85.58% <0.00%> (?)
python3.8 84.52% <0.00%> (?)
python3.9 85.68% <0.00%> (?)
torch1.4 85.45% <0.00%> (?)
torch1.5 85.47% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b9364f9...583fcf2. Read the comment docs.

@kamil-kaczmarek kamil-kaczmarek merged commit b500619 into neptune-ai:master Mar 30, 2021
shnela pushed a commit that referenced this pull request Apr 14, 2021
* Fix some test errors
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* checkpoint consolidation

* Update ddp_spawn.py

* Update test_metric_result_integration.py

* Update test_results.py

* Update utils.py

* Update utils.py

* Update test_all_gather_grad.py

* Update test_all_gather_grad.py

* Update test_results.py

* Revert "Update test_results.py"

This reverts commit 9d4a2b8.

* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"

This reverts commit c5053da, reversing
changes made to 0d23d75.

* Revert "Update test_all_gather_grad.py"

This reverts commit 0d23d75.

* Revert "Update utils.py"

This reverts commit 70fe5da.

* Revert "Update utils.py"

This reverts commit a9aae99.

* Revert "Update test_results.py"

This reverts commit ea74906.

* Revert "Update test_metric_result_integration.py"

This reverts commit bf70e43.

* Revert "Update ddp_spawn.py"

This reverts commit f172101.

* Revert "checkpoint consolidation"

This reverts commit 536c132.

* Revert "Revert "checkpoint consolidation""

This reverts commit 3a9fde9.

* Revert "Revert "Revert "checkpoint consolidation"""

This reverts commit 7a369f4.

* Revert "Revert "Update ddp_spawn.py""

This reverts commit 8222dc9.

* Revert "Revert "Update test_metric_result_integration.py""

This reverts commit 6c095b2.

* Revert "Revert "Update test_results.py""

This reverts commit 250d0aa.

* Revert "Revert "Update utils.py""

This reverts commit 8651d54.

* Revert "Revert "Update test_all_gather_grad.py""

This reverts commit dcdcd29.

* modify distributed environment to make test pass

* add DDP communication hook

* remove test related setting

* remove more test related setting

* fix ddp comm hook util import issue

* comments

* one more fix for test_custom_plugin

* fix ddp spwan

* fix sgd

* address comments and add tests

* 1. add is gpu checking 2. modify test a bit 3. formatting

* formatting nit

* fix conda 3.7 1.7 issue for no torch.distributed.algorithms module

* need at least 1.8.0

* minor fix

* modify changelog

* changelog should link to PR number instead of issue number

* refine a bit on doc for register_ddp_comm_hook function, like ddp_comm_wrapper explanation and add hyperparameter for power sgd states in example usge

* move single device checking before call register_ddp_comm_hook

* formatting

* comments

* typo

* pre-commit formatting
shnela pushed a commit that referenced this pull request May 13, 2021
* Fix some test errors
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* checkpoint consolidation

* Update ddp_spawn.py

* Update test_metric_result_integration.py

* Update test_results.py

* Update utils.py

* Update utils.py

* Update test_all_gather_grad.py

* Update test_all_gather_grad.py

* Update test_results.py

* Revert "Update test_results.py"

This reverts commit 9d4a2b8.

* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"

This reverts commit c5053da, reversing
changes made to 0d23d75.

* Revert "Update test_all_gather_grad.py"

This reverts commit 0d23d75.

* Revert "Update utils.py"

This reverts commit 70fe5da.

* Revert "Update utils.py"

This reverts commit a9aae99.

* Revert "Update test_results.py"

This reverts commit ea74906.

* Revert "Update test_metric_result_integration.py"

This reverts commit bf70e43.

* Revert "Update ddp_spawn.py"

This reverts commit f172101.

* Revert "checkpoint consolidation"

This reverts commit 536c132.

* Revert "Revert "checkpoint consolidation""

This reverts commit 3a9fde9.

* Revert "Revert "Revert "checkpoint consolidation"""

This reverts commit 7a369f4.

* Revert "Revert "Update ddp_spawn.py""

This reverts commit 8222dc9.

* Revert "Revert "Update test_metric_result_integration.py""

This reverts commit 6c095b2.

* Revert "Revert "Update test_results.py""

This reverts commit 250d0aa.

* Revert "Revert "Update utils.py""

This reverts commit 8651d54.

* Revert "Revert "Update test_all_gather_grad.py""

This reverts commit dcdcd29.

* modify distributed environment to make test pass

* modify model state dict to training type plugin

* remove changes

* add changelog

* fixing isort for pre-commit failure

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address code review

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
kamil-kaczmarek pushed a commit that referenced this pull request Aug 1, 2021
* Fix some test errors
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* checkpoint consolidation

* Update ddp_spawn.py

* Update test_metric_result_integration.py

* Update test_results.py

* Update utils.py

* Update utils.py

* Update test_all_gather_grad.py

* Update test_all_gather_grad.py

* Update test_results.py

* Revert "Update test_results.py"

This reverts commit 9d4a2b8.

* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"

This reverts commit c5053da, reversing
changes made to 0d23d75.

* Revert "Update test_all_gather_grad.py"

This reverts commit 0d23d75.

* Revert "Update utils.py"

This reverts commit 70fe5da.

* Revert "Update utils.py"

This reverts commit a9aae99.

* Revert "Update test_results.py"

This reverts commit ea74906.

* Revert "Update test_metric_result_integration.py"

This reverts commit bf70e43.

* Revert "Update ddp_spawn.py"

This reverts commit f172101.

* Revert "checkpoint consolidation"

This reverts commit 536c132.

* Revert "Revert "checkpoint consolidation""

This reverts commit 3a9fde9.

* Revert "Revert "Revert "checkpoint consolidation"""

This reverts commit 7a369f4.

* Revert "Revert "Update ddp_spawn.py""

This reverts commit 8222dc9.

* Revert "Revert "Update test_metric_result_integration.py""

This reverts commit 6c095b2.

* Revert "Revert "Update test_results.py""

This reverts commit 250d0aa.

* Revert "Revert "Update utils.py""

This reverts commit 8651d54.

* Revert "Revert "Update test_all_gather_grad.py""

This reverts commit dcdcd29.

* modify distributed environment to make test pass

* fix version for ddp plugin test

* fix

* fix

* changelog

* Update CHANGELOG.md

* fsdp with full state dict

* fix missing import

* modify unitest

* fix

* fix

* fix typo

* modify test and add changelog

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* limit max_epoch to 1 for testing

* test

* fix

* update

* testing remove special for multi gpu

* assert gpu

* add assertion for gpu

* fix

* Re-enable special test, use ModelCheckpoint

* Fix paths

* Fix path passing

* test

* test

* fix test

* fix

* pre-commit format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.