Update local pytorch-lightning master #1

* add padding * fix * fix * Update pytorch_lightning/callbacks/progress.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * updated based on suggestion * changelog * add test * fix pep8 * resolve test * fix code format Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: tchaton <thomas@grid.ai>

* Update properties.py * pep8

* update changelog * apply untoggle_optimizer when result is None * update tests * still return loss sometimes * Update CHANGELOG.md Co-authored-by: deng-cy <dcy1996@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Add initial deepspeed changes * Address code review * Move static method outside of function * Fixes * Add missing annotation * Remove seed setting * Doc changes * Doc changes, add address reviews * Fix docs * Try fixing issue by moving to torch adam * Clean up check * Changes, better APIs! * Add wrapper, swap to git install revision * Add special test * Add warning * Address review * Add better disclaimer * Turn off ZeRO for testing due to compilation * Add description on modifying parameters via the plugin * Doc strings clear * Small doc fixes * Fix hash, reduce test * Added CI change * Move to azure pipeline * Fix test name * Add missing flag * Remove sudo... * Try conda instead * Swap to conda base * Try suggested install * Apply suggestions from code review * Apply suggestions from code review * Revert "Apply suggestions from code review" This reverts commit 41cca05 * Revert "Apply suggestions from code review" This reverts commit e06ec29 * Remove setter * Address most review * Move out function, remove DeepSpeed from requirements * Install deepspeed/mpi4py within container * Use special tests, move to master commit for deepspeed * Export path * Force compile to happen first * Remove! * Debugging ninja * Fix error in optimizer step logic * Attempt to fix symbolic link * Reverse to aid debugging * Export path again * Clean up mess * var * Revert "var" This reverts commit 3450eac * Address review, add todo * Add note about unsupported functionality Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* Trainer only references accelerator where it can * Move teardown to the trainer, as it is reponsible for the accelerator

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

…ure (#6043) * Move to CUDA image * Remove deepspeed install as deepspeed now in the cuda image * Remove path setting, as ninja should be in the container now

* ro1 * ro2

* added on_post_move_to_device * added tests * docs and refactors * Update tests/backends/test_tpu_backend.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update docs/source/tpu.rst Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update docs/source/tpu.rst Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/core/decorators.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/core/decorators.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update docs/source/tpu.rst Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/core/decorators.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/core/decorators.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/core/decorators.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/core/decorators.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/core/hooks.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * moved weight sharing module back to test updated tpu available * add count to warning * fix doctest * import trainer in doctest * import trainer in doctest * do not test code as no TPU device * param count to layer count * formatting * update docs * update import * update * resolve tests * remove legacy accelerator Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Your Name <you@example.com>

* Remove TrialMNISTDataModule * Allow using TrialMNIST in the MNISTDataModule * Update tests/helpers/datasets.py

* Fix: Allow hashing of metrics with lists in their state * Add test case and modify semantics of Metric __hash__ in order to be compatible with structural equality checks * Fix pep8 style issue Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* et al. * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: chaton <thomas@grid.ai>

…se and verbose (#6045)

* fix/test quant * ... * ---

#6044) * Add descriptions to accelerator broadcast function/clean up all_gather * Remove todo

* add hooks * comment * docs * add tests * make it private * fix tests * docs * chlog * testcode * codefactor * fix doctest * fix doctest * suggestions * is always overriden * pep and BoringModel * BoringModel * docs * docs * docs * fix * rebase * rebase * suggestions * docs * suggestions * try fix docs * docs * update name * yapf * docs * rebase * yapf

* Make parallel devices optional across all plugins so that they can be instantiated * Add any to types to capture vars passed in

Put .test() in code blocks

Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* rename get_model -> lightning_module * update references to get_model * pep8 * add proper deprecation * remove outdated _get_reference_model * fix cyclic import

* rename accelerator backend * rename new additions from master * add proper deprecation * pep8 * warning match * add missing warning type

* flake8 * fix cyclic import * isort

* Add warnings to hooks * Add default idx to prevent signature change in the future * Nothing to see here * Add default val to transfer_batch_to_device hook * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Revert "Add default val to transfer_batch_to_device hook" This reverts commit 5c6a68f Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* v1.2.0rc2 * chlogs * chlogs * format * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* fix docs * update on comments * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * rm comment * Update docs/source/common/lightning_module.rst Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai>

…attribute not found (#6024) * Empty commit * Raise AttributeError instead of ValueError * Make functions private * Update tests * Add match string * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * lightning to Lightning Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* v1.2.0 * docs

* add Azure tags trigger * fix * mnodes

* pypi azure badges - tags * pep8 * id

* precision fixes * add amp test model * fix test * revert * move assert to training step * fix test * fix test * remove unrelated changes * add changelog * remove unused import

* reduction docs * docs for abstract base method * make mean the default * add preliminary chlog Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

…accelerator (#6089) * Give priority to plugins to set distributed mode, and then accelerator * Add CHANGELOG.md * Update CHANGELOG.md * Remove very scary line * Ensure we set cluster environment after slurm configured if necessary * Simplify the fix with a reset Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Enable ZeRO optimization, and make sure that the lightning module hook is called when we move to half precision * Added test, update to function

* Expose deepspeed config parameters to init function due to instability in parameters * See if tests can run on normal CI, without special tests * Add changelog * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fix weird test * fix apex plugin test * fix raise * cpu test * fix type * add changelog

* Update Contributing Guide * update docs

* Fix wrong render * Improve classification metrics docs * Improve other domain metrics docs * Change the structure level in the docs

…#6109)

* Trainer.test should return only test metrics (#5214) * resolve bug * merge tests * Fix metric state reset (#5273) * Fix metric state reset * Fix test * Improve formatting Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai> * print() method added to ProgressBar * printing alongside progress bar added to LightningModule.print() * LightningModule.print() method documentation updated * ProgressBarBase.print() stub added * stub * add progress bar tests * fix isort * Progress Callback fixes * test_metric.py duplicate DummyList removed * PEP and isort fixes * CHANGELOG updated * test_progress_bar_print win linesep fix * test_progress_bar.py remove whitespaces * Update CHANGELOG.md Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Tadej Svetina <tadej.svetina@gmail.com> Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai> Co-authored-by: Alexander Snorkin <Alexander.Snorkin@acronis.com> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* running stage * circular import * running stage cleanup * fix unused import * fix running stage access * add return type * Revert "add return type" This reverts commit 65b0fe2. * try fix typing

* Be more specific with DeepSpeed compatibility * Better wording

Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

* fixing tested values * . * tests * yapf * softmax * hvd * rename * lr * duplicate * drop * classif * rm EvalModel * Revert "rm EvalModel" This reverts commit 6c3fb39. * update tests * fix * azure * azure * self * cpu * Apply suggestions from code review Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>

* prune profiler * chlog

* prune enable_pl_optimizer * prune automatic_optimization

* prune deprecated metrics for 1.3 * isort / yapf

…_interval < 1.0 (#6075) * fix bug * fix tests * changelog * fix pep8 * fix tests * fix and add some tests * add test for rlop * chlog * Update CHANGELOG.md Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>

* prune prefix * prune mode=auto * chlog

Co-authored-by: Roger Shieh <sh.rog@protonmail.ch> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* add issue config * remove question template * update URL * Update README.md * Update README.md Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update .github/ISSUE_TEMPLATE/config.yml Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Kaushik Bokka <kaushikbokka@gmail.com>

* Fix for multiple callbacks * Add CHANGELOG.md * Remove old params * Skip tests on windows using ddp * Change name of the variable to not clash with should stop, which is separate * Apply suggestions from code review * Fix params Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* Document exceptions in loggers * minor formatting * docstring changed in comet.py * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Update apply_func.py The name Batch is no longer located under torchtext.data --Error message-- File "/home/daniel/py38/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 25, in <module> from torchtext.data import Batch ImportError: cannot import name 'Batch' from 'torchtext.data' (/home/daniel/py38/lib/p ython3.8/site-packages/torchtext/data/__init__.py) You can fix this by changing line line 28 to: from torchtext.legacy.data import Batch * Update apply_func.py * Update apply_func.py * Update apply_func.py * Update apply_func.py * Update apply_func.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Prajakta Phadke <pphadke@iu.edu> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* skipif + yapf + isort * tests * docs * pp

* docstring changes in profilers * minor changes in profilers.py

#6147) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

* Fix for incorrect detach/cpu calls (#6214) * Fix incorrect use of detach(), to(), and cpu(), #6214 * Fix incorrect use of detach() and cpu(), #6214 * update pr * add typing * chlog * more... * revert on module * update on comments * revert changes on model Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* rename * if * test * chlog

* docstring changes in tuner * added full stop

* Change default for CPU offload to false for best throughput/memory efficiency * Add changelog * default Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* ngpus * gpu * isort * pt * flake8

* Improved early stopping documentation * Changed to 120 column format * doc * doc * doc Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* win * isort * flake8

Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* args * native * apex * isort

* Ensure we check deepspeed/sharded in multinode * Add CHANGELOG.md * Add CHANGELOG.md * Drop mock, use actual multi-gpu node

)

* try to fix imports * legacy 1.2.1

* TPU * horovod * extra * fix * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * doc Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* special * rpc

* add fairscale & windows to skipif * add deepspeed to runif * fairscale * deepspeed * flake8 Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* update * resolve flake8 * update * update * update changelog * update * resolve flake8 Co-authored-by: Your Name <you@example.com>

* AMP * fuse * yapf

… when saved during training (#6073) Co-authored-by: chaton <thomas@grid.ai>

…6296) * resolve an issue with TPU * update * add changelog

* drop unused pl model in ckpt * irelevant * on_evaluation_batch_start * evaluation_epoch_end * attach_datamodule

* ci: azure reinstall torchtext * move * todos * 0.6.0 * skip examples * formatter * skip * todo * Apply suggestions from code review

* Rely on training type plugin when saving * Add better typing to training type plugin

* Create branch tests/4400_parsing * Rename test file for parsing.py * Fix lightning_hasattr * Fix lightning_hasattr * Fix lightning_setattr * Add empty lines and remove rubbish spaces * Raise AttributeError not ValueError * Use getattr in hasattr * Remove rubbish spaces * Fix getattr * Fix by flake8 * Add tests for str_to_bool_or_str * Fix by flake8 * Add tests for str_to_bool * Add tests for is_picklable * Add tests for clean_namespace * Fix typo * Fix lightning_getattr * Add tests for AttributeDict * Add tests for flatten_dict * Fix by flake8 * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Apply isort * Revert "Apply suggestions from code review" * Define unpicklable_function outside * Add comment to test_clean_namespace * Add tests for parse_class_init_keys * Add tests for get_init_args and collect_init_args * Share objects across the tests Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* add ignore param to save_hyperparameters * add docstring for ignore * add type for frame object * Update pytorch_lightning/core/lightning.py Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Update pytorch_lightning/core/lightning.py Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * fix whitespace * Update pytorch_lightning/core/lightning.py Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Parametrize tests * Update pytorch_lightning/core/lightning.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/core/lightning.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * seq * fix docs * Update lightning.py * Update lightning.py * fix docs errors * add example keyword * update docstring Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Fix when _stable_1d_sort to work when n >= N * Apply suggestions Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

* add to docs * update docs * Apply suggestions from code review * Update pytorch_lightning/core/hooks.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * nested loaders * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * shorten text length * Update pytorch_lightning/core/hooks.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* default_root_dir=tmpdir * miss

* document exception for metrics/classification * minor formatting fixes * fix trailing whitespaces * document exception for metrics * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Call clip gradients if clip val greater than 0 * format * Format * Move to top of file

* update * resolve bug

* docstring changes in accelerators * docstrings moved * whitespaces removed * PEP8 correction[1]

* fix * update * update * add changelog * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update tests/accelerators/test_dp.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update changelog Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* handle distributed_sampler_kwargs * move emptying cache to accelertor * fix a few tests * restoring the result from subprocess * fix queue.get() order for results * add missing "block_backward_sync" context manager * add missing "block_backward_sync" context manager * fix sync_batchnorm * fix supported gpu-ids for tuple * fix clip gradients and inf recursion * accelerator selection: added cluster_environment plugin * fix torchelastic test * fix reduce early stopping decision for DDP * fix tests: callbacks, conversion to lightning optimizer * fix lightning optimizer does not pickle * fix setting benchmark and deterministic option * fix slurm amp test * fix prepare_data test and determine node_rank * fix retrieving last path when testing * remove obsolete plugin argument * fix test: test_trainer_config * fix torchscript tests * fix trainer.model access * move properties * fix test_transfer_batch_hook * fix auto_select_gpus * fix omegaconf test * fix test that needs to simulate slurm ddp * add horovod plugin * fix test with named arguments * clean up whitespace * fix datamodules test * remove old accelerators * fix naming * move old plugins * move to plugins * create precision subpackage * create training_type subpackage * fix all new import errors * fix wrong arguments order passed to test * fix LR finder * Added sharded training type and amp plugin * Move clip grad to precision plugin * Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically * Fix import issue, attempting to fix tests * Fix initial test * Reflect hook logic from master, should wrap model after move to device * Optional state consolidation, since master has optimizers not wrapped * change attribute for instance test * reset optimizers optimizers are not used in main process, so state would be wrong. * legacy * imports in accel * legacy2 * trainer imports * fix import errors after rebase * move hook to new setup location * provide unwrapping logic * fix trainer callback system * added ddp2 implementation * fix imports .legacy * move plugins * restore legacy * drop test.py from root * add tpu accelerator and plugins * fixes * fix lightning optimizer merge * reset bugreportmodel * unwrapping * step routing forward * model access * unwrap * opt * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * manual optimization * update optimizer routing * add rank to torchelastic * fix memory mixed precision * setstate on trainer for pickling in ddp spawn * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856) * resovle a bug * Accelerator refactor sharded rpc (#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * abstract the cluster plugins * default plugin * integrate default environment * fix property * adapt tests * adjust test * fix world size access * base cluster env * revert rebase errors * revert rebase errors * missing import * revert unrelated change * remove unused cluster local rank * remove unrelated changes * fix unrelated changes * fix pep8 * remove unused var * reset permissions * ypaf * test default environment * test torchelastic environment * world size as int * tests for slurm environment * changelog * test comments * remove unintended change * keep master port fixed after it is generated * test random master port * yapf * add missing default environment * move helper function * rename default environment * rename * rename * yapf * Update pytorch_lightning/plugins/environments/lightning_environment.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update CHANGELOG.md Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * spawn -> create Co-authored-by: justusschock <justus.schock@posteo.de> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* resolve bug * resolve flake8 * revert name

* update changelog for v1.2.2 * ckpr 1.2.2 Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* patch download * CI * isort * extra

* resolve bug * update changelog * Update tests/trainer/test_trainer.py * Update pytorch_lightning/profiler/profilers.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * resolve comments * resolve flake8 Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* copy torchtext batch * update * rev * rev

@tchaton

…te() (#4945) * Update code Co-authored-by: EliaCereda * More property updates * Move properties. Introduce trainer._fitting * Use trainer.fitting * Fix reset dataloaders * Unused code * RunningStage.SANITY_CHECKING * Use setters * Fix bugs * Fix bugs * TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING} * Fix bugs * Fix bugs * Fix tests * Update CHANGELOG. Add deprecation warning. Fix tests * Unused imports * Optional trainer * More deprecation. More refactoring * Correct version * Use properties * Address comments * flake8 * Missed renamings * Typo * is -> == It is recommended to use for Enums since they are singletons, however, since the LightningEnum subclasses str, it's not a good idea in case a user sets the state/stage with a str * Also for tests * Typo * Address @tchaton's comments * PEP8 * Correct property * Update CHANGELOG * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Remove called sanity check Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* adjust versions * release * manifest * pep8 * CI * fix * build

* Use f-"""-string * Add r * Use Trainer. * r -> noqa: W605

* remove warning * auto_opt * chlog * auto_opt * no_warning_call * rm old code * add warning for predict * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Fix automatic_optimization * Fix automatic_optimization * Uncomment fairscale

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai>

… option (#6320) * improve doc to describe how to combine batches of multiple test and val dataloaders simultaneously * fix typo Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * use paramref Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

@import-antigravity

* Fix manual optimization docs * Fix typo. Thanks @import-antigravity

…nts (#6136) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Update tensorboard.py * Update logging.rst * pep8 * Update logging.rst * Update logging.rst * Apply suggestions from code review * add code sample * Update logging.rst Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

…TPU (#6221) * Fix bug Fix AttributeError: 'NoneType' object has no attribute 'finalize' * Update CHANGELOG.md * deleted a period * Update CHANGELOG.md Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * Update CHANGELOG.md * Update pytorch_lightning/plugins/training_type/tpu_spawn.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* fix * update * fix * move the class outside

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

) * fixed bug where tuner would not tune lr if also tuning batch_size * added a '+1' to computing the smoothed loss. This maintains the behavior for the smoothed loss as before the bug fix * pep8 fix * add changelog Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* fix * add simple test * fix imports * add changelog * tighter test with on_fit_start hook closer to the dispatch call * move class inside test f unction * add a comment

* typing * yapf * typing

* update changelog * legacy 1.2.3 Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* fix dummy logger * docs * update docs * add changelog * add none return annotation * return empty string for name, version

* raise an exception if check_val_every_n_epoch is not an integer * remove unused object * add type hints * add return type * update exception message * update exception message

…bility (#6438) * Set find unused parameters to True by default to fix breaking models, add suggestion to re-enable * Add changelog

* add test * update changelog * update * rename function

…6460) * Ensure we set the default device before initializing deepspeed * Add CHANGELOG.md * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

…es. (#6277) * cleaning SWA (#6259) * rename * if * test * chlog * Remove opt from manual_backward in docs (#6267) * switch agents pool (#6270) * Allow user to disable the automatic formatting of checkpoint file names. * Added changelog entry. * Made flake8 happy. * Applied review suggestion: quotes for special characters in docstring Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Fixed example in docstring. * Fixed syntax error in docstring. Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* argparse: Add inplace option Replicate in GAN model * datamodule: Deduplicate logic w/ argparser utilities * Update pl_examples/domain_templates/generative_adversarial_net.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * Keep docstrings * Correct name * Whitespace * Consistency * fix weird type stuff * try alt - use_argument_group * fix syntax + lint * fix ci errs * fix ci * change examples... still failing w/ "unrecognized arguments: --batch_size" * address review * mnist_datamodule: add some docstrings * argparse: check cls or cls.__init__ for param didn't capture issue, but meh * fix lint * fix no-doc edge case * address review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

* add exceptions and test * hook * fix * clean up * clean up * regex * regex * docs * rev * comment and docs * chlog * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Apply suggestions from code review Co-authored-by: chaton <thomas@grid.ai> * Monkey-patch device count * docs * pep * api_change Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai>

…back (#6146) * Update model_checkpoint.py * add tests * Update model_checkpoint.py * Update test_model_checkpoint.py * fix tests * every_n_batches * Update test_model_checkpoint.py * defaults * rm tests * Update model_checkpoint.py * Update test_model_checkpoint.py * Prune deprecated metrics for 1.3 (#6161) * prune deprecated metrics for 1.3 * isort / yapf * Update model_checkpoint.py * add tests * defaults * Update CHANGELOG.md * pre-commit * Update model_checkpoint.py * update defaults * Update test_remove_1-5.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * fix tests * Update test_model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update test_model_checkpoint.py * ckpt-callback * Update test_model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * validation-end * Update model_checkpoint.py * Update test_model_checkpoint.py * Update test_model_checkpoint.py * Update test_model_checkpoint.py * Update test_model_checkpoint.py * clarify-names - Make names explicit as to which hooks they apply to - Use step instead of batch for consistency with global step * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * mutual-exclusive Make every_n_train_steps and every_n_val_epochs mutually exclusive * fix-default-0 * Update CHANGELOG.md * formatting * make-private make attributes private to the class * rebase Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Remove unused mixing attributes * Missing import

* Fix zero_grad in docs * Fix zero_grad in docs

…g datamodule (#5968)

* add docs and minor updates * docs * fraction

* resolve bug * update * update changelog * update PR * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * add todo * resolve issues * resolve flake8 * update * add coverage for reduce * wip * restore back to brodbact * remove test.py * resolve flake8 * update * check world size * resolve test * update * use pytorch version when defined * update on comments * update on comments * flake8 * resolve bugs * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update * update * update * update * remove test * update * resolve flake8 * update * update * update * proxy * update * update * resolve typo * prune * update parallel * update Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* testing on python 3.8 * req

* document exceptions for metrics/functional * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

@SkafteNicki

* init information retrieval metrics * changed retrieval metrics names, expanded arguments and fixed typo * added 'Retrieval' prefix to metrics and fixed conflict with already-present 'average_precision' file * improved code formatting * pep8 code compatibility * features/implemented new Mean Average Precision metrics for Information Retrieval + doc * fixed pep8 compatibility * removed threshold parameter and fixed typo on types in RetrievalMAP and improved doc * improved doc, put first class-specific args in RetrievalMetric and transformed RetrievalMetric in abstract class * implemented tests for functional and class metric. fixed typo when input tensors are empty or when all targets are False * fixed typos in doc and changed torch.true_divide to torch.div * fixed typos pep8 compatibility * fixed types in long division in ir_average_precision and example in mean_average_precision * RetrievalMetric states are not lists and _metric method accepts predictions and targets for easier extension * updated CHANGELOG file * added '# noqa: F401' flag to not used imports * added double space before '# noqa: F401' flag * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * change get_mini_groups in get_group_indexes * added checks on target inputs * minor refactoring for code cleanness * split tests over exception raising in separate function && refactored test code into multiple functions * fixed pep8 compatibility * implemented suggestions of @SkafteNicki * fixed imports for isort and added types annontations to functions in test_map.py * isort on test_map and fixed typing * isort on retrieval and on __init__.py and utils.py in metrics package * fixed typo in pytorch_lightning/metrics/__init__.py regarding code style * fixed yapf compatibility * fixed yapf compatibility * fixed typo in doc Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* deprecate metrics * examples * req * docs * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * pep8 Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* init test: test_lr_find_with_bs_scale * Update test_lr_finder.py * remove gpu req * try boring model * custom boring model * pep8 * fix typo * Update test_lr_finder.py * typo * typo

* Clean up docs and add some explicitness around stages * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update hook lifecycle * Update docs/source/common/lightning_module.rst

* base class * extensions * chlog * _stable_1d_sort * _check_same_shape * _input_format_classification_one_hot * utils * to_onehot * select_topk * to_categorical * get_num_classes * reduce * class_reduce * tests

* return from plugin * dont return for tpu

* change tests * fix * test * _defaults_from_env_vars Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* _basic_input_validation * _check_shape_and_type_consistency * _check_num_classes_binary * _check_num_classes_mc * _check_num_classes_ml * _check_top_k * _check_classification_inputs * _input_format_classification * _reduce_stat_scores * DataType * rest * flake8 * chlog

* docs * wrapper * test * count * flake8

* add outputs param for on_val/test_epoch_end hooks * update changelog * fix warning message * add custom call hook * cache logged metrics * add args to docstrings * use warning cache * add utility method for param in sig check * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update docstring * add test for eval epoch end hook * add types and replace model ref * add deprecation test * fix test fx name * add model hooks warning * add old signature model to tests * add clear warning cache * sopport args param * update tests * add tests for model hooks * code suggestions * add signature utils * fix pep8 issues * fix pep8 issues * fix outputs issue * fix tests * code fixes * fix validate test * test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* add trick to doc * update * update path * Update docs/source/benchmarking/performance.rst Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* fix deprecation wrapper & tests * flake8

* prune accuracy * chlog * flake8 * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * wrap * test * test * fix Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* class: AUC AUROC * func: auc auroc * format * tests

* update doc * update example

* avg precision * precision * recall * curve * tests * chlog * isort * fix

* Update changelog for v1.2.4 * lagacy v1.2.4 * prune duplicates from changelog Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* Move connection setup into the setup function. Call setup hook after we set up the accelerator * Added CHANGELOG.md * fix setup order in callback test * fix input arguments in test * Mock distributed function, remove protection to turn into training type hook * Remove import * Add missing mock, ensure custom plugin does not create children process * Skip test on windows * Update deepspeed to init connection in setup * Do not initialize distributed module * Move DeepSpeed tests to special tests since dist communication is being set up * Special the test to see if this fixes CI * Delete accelerator connector test to see if its causing build to fail * Delete deepspeed test * Revert "Delete accelerator connector test to see if its causing build to fail" This reverts commit edde60b * Revert "Delete deepspeed test" This reverts commit 9d317429 * Reverse hook * Reverse setup hooks to debug again * Add todo so i know where i left off * For single device move in pre_dispatch after setup function * Add additional model to device hook if any additional parameters have been set * See if we can enable deepspeed tests * Revert "See if we can enable deepspeed tests" This reverts commit b5450de * See if this hook approach works * Introduce new granular hooks * Remove import, fix tpu spawn by moving the function to setup * Added missing special test Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add NVIDIA flows * push * pull * ... * extras * ci prune * fix * tag * . * list

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch> Co-authored-by: Kaushik Bokka <kaushikbokka@gmail.com>

* confusion_matrix * iou * f_beta * hamming_distance * stat_scores * tests * flake8 * chlog

* try Azure * -e * path

* Add Tests for val and test-steps * Add native AMP * pep8 tests * pep8 plugin * changelog

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

) * Allow training_type_plugin to delay optimizer configure * Add missing references to trainer, add a CPU accelerator based test

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* refactoring setup * . * docs * flake8

* mock examples * drop from GA

…6633) * add setup * update * updates on comment * Minor changes * Extra import * Docs Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

* fix comparing versions * chlog * . * ... * datasets

* explained_variance * tests * mean_absolute_error * mean_squared_error * mean_relative_error * mean_squared_log_error * chlog

* psnr * r2score * ssim * chlog

Co-authored-by: tchaton <thomas@grid.ai>

* add predict_step * Update predict_loop.py * Update trainer.py * Update trainer.py * resolve bugs * update * update * update * resolve bug * resolve some failing tests * udpate tests * update * resolve tests * add a test * remove typo * add a test for attachement * update * changed to on_train_dataloader * remove __flash_special_attr__ * resolve tests * update * update * update * update on comments * Update pytorch_lightning/trainer/data_loading.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: thomas chaton <thomas@grid.ai>

* Refactor profilers * Update PassThrough * WIP - This is broken and will change * Update pytorch_lightning/profiler/pytorch.py Co-authored-by: thomas chaton <thomas@grid.ai> * resolve tests * resolve tests * find output * try something * update * add support for test and predict * update * update * use getattr * test * test * update * tests * update * update * update * update * update * remove file * update * update * update * update * update * test * update# * update * update tests * update * add suport for 1.8 * rename records * add support for 1.8 * update * resolve flake8 * resolve test * Refactor basic profilers * Fixes * Unused import * Introduce setup * Profile on all ranks. Print to stdout on 0 * Introduce dirpath + filename * CHANGELOG * Add tests. Address comments * add `on_run_stage_setup` * add on_run_stage_setup function * update * add test for RegisterRecordFunction * update lightnng flow direction * move variable to private * remove trace * Undo code that should be in 3/4 * Multi-stage multi-rank * 2/5 changes * Pass stage in __del__ * Remove TODOs * Describe on_evaluation_end. Add tests * Typo * Address comments * deepcopy tests * Advanced teardown * Fix teardown test * Fix tests * Minor change * Update CHANGELOG.md * Fix test * Quick fixes * Fix 6522 * resolve ddp tests * resolve tests * resolve some tests * update tests * resolve tests * update * resolve tests * resolve some tests * Missed fixes from 3/5 * Fixes * resolve some tests * resolve test for 1.7.1 * Broken refactor * Missed stage * Minor changes * resolve tests * Update CHANGELOG * resolve bug * remove print * Typo * Cleanup * resolve ddp test * remove barrier * update profiler * update * Smaller model * update * resolve tests * update * Minor changes. CHANGELOG * Minimize diff * update to 1.8.1 * RunIf. Extra code. Check segfault * resolve tests * Typo. Bad merge * Fixing a bad merge * replace for kineto * Update pytorch_lightning/profiler/pytorch.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Update pytorch_lightning/profiler/pytorch.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Minor changes * Bad merge * Use lists for flexibility * Use sets * predict_step * Ananth's suggestion * update * Docs * Update pl_examples/basic_examples/profiler_example.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update example * update example Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update coverage config * parallel * parallel * Apply suggestions from code review * Apply suggestions from code review * paralel * paralel * paralel * combine * combine * . * .. * .. * .. * rev * cb * cb * drop * drop * . * .. * ... * ... * ... * .

* classif * grad_img * nlp * ssl * format

* fix: update example autoencoder.py to reflect args * Update pl_examples/basic_examples/autoencoder.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Remove E231 from ignore list * Follow E231 * Update pytorch_lightning/trainer/data_loading.py

* Metrics holder cleanup and better error message * Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py * _VALUE -> _METRIC_TYPE

#6625)

* Fix checkpoint callback issue for TPUs * update changelog * add barrier * apply code suggestions * update trainer test * remove spaces * fix tpu tests * Apply suggestions from code review * add comment Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai>

… of train/val/test (#6498) * update docs * add hook and update docs * update tests * chlog * Update CHANGELOG.md Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * chlog Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* use external deprecate * simplify * simplify * simplify * flake8 * . * others * .

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Add artifcact_location arg to MLFlow logger * Add CHANGELOG URL * Update test

* add warning non reduced * add test * update test * update changelog * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * update Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* use latest * remake * examples

…6667) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* support python 3.9 * update CI * onnxruntime * . * . * onnxruntime * t 55 * t 75 * add script * use * onnx * onnx * onnx * whl * np * find * 21 * Apply suggestions from code review * Apply suggestions from code review * onnx * CI * req * ~ dockers * min * . * drop horovod * drop horovod * drop horovod * fix * fix * .

…6719) * update_logic * update * Update tests/utilities/test_xla_device_utils.py * Update pytorch_lightning/utilities/xla_device.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * Update pytorch_lightning/utilities/xla_device.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * update test * Update tests/utilities/test_xla_device_utils.py * update * Apply fix * Docstring * flake8 * update Co-authored-by: Your Name <you@example.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

* move save_checkpoint responsability to accelerator * update

* Add base hook for model parallel * fix callback signature * Simplify hook * Add hook logic * add tests * add property setter * add logic for being called once * Update changelog * Fix * fix return type * fix lambda callback test * Fix tests * Apply code suggestions * add logic for setup_optimizers_predispatch * add common dummy model * Swap call order * Remove test that isn't needed anymore * Update tests * Add a bit more doc * Few code review fixes * Update pytorch_lightning/accelerators/accelerator.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Change hook name * Fix test * Test setup hook, refactor names * Swap call order of callbacks and model initialization * Change name of context manager Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update chlog v1.2.5 * legacy

Update local pytorch-lightning master #1

Update local pytorch-lightning master #1

Commits on Feb 17, 2021

Commits on Feb 18, 2021

Commits on Feb 19, 2021

Commits on Feb 20, 2021

Commits on Feb 21, 2021

Commits on Feb 22, 2021

Commits on Feb 23, 2021

Commits on Feb 24, 2021

Commits on Feb 25, 2021

Commits on Feb 26, 2021

Commits on Feb 27, 2021

Commits on Feb 28, 2021

Commits on Mar 1, 2021

Commits on Mar 2, 2021

Commits on Mar 3, 2021

Commits on Mar 4, 2021

Commits on Mar 5, 2021

Commits on Mar 6, 2021

Commits on Mar 7, 2021

Commits on Mar 8, 2021

Commits on Mar 9, 2021

Commits on Mar 10, 2021

Commits on Mar 11, 2021

Commits on Mar 12, 2021

Commits on Mar 14, 2021

Commits on Mar 15, 2021

Commits on Mar 16, 2021

Commits on Mar 17, 2021

Commits on Mar 18, 2021

Commits on Mar 19, 2021

Commits on Mar 20, 2021

Commits on Mar 21, 2021

Commits on Mar 22, 2021

Commits on Mar 23, 2021

Commits on Mar 24, 2021

Commits on Mar 25, 2021

Commits on Mar 26, 2021

Commits on Mar 28, 2021

Commits on Mar 29, 2021

Commits on Mar 30, 2021