Skip to content

Commit

Permalink
Merge branch 'master' into fix-rm-outputs-train-epoch-end
Browse files Browse the repository at this point in the history
  • Loading branch information
Borda authored May 5, 2021
2 parents f2f8b58 + 1a6dcbd commit c84577d
Show file tree
Hide file tree
Showing 22 changed files with 312 additions and 160 deletions.
8 changes: 5 additions & 3 deletions .github/workflows/ci_dockers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ jobs:
strategy:
fail-fast: false
matrix:
python_version: [3.6]
pytorch_version: [1.4, 1.7]
python_version: [3.7]
pytorch_version: [1.4, 1.8]
steps:
- name: Checkout
uses: actions/checkout@v2
Expand All @@ -46,7 +46,7 @@ jobs:
fail-fast: false
matrix:
python_version: [3.7]
xla_version: [1.6, 1.7, "nightly"]
xla_version: [1.6, 1.8, "nightly"]
steps:
- name: Checkout
uses: actions/checkout@v2
Expand Down Expand Up @@ -137,6 +137,8 @@ jobs:

build-nvidia:
runs-on: ubuntu-20.04
# todo: temporarily skip as the base container does not fit to agent
if: false
steps:
- name: Checkout
uses: actions/checkout@v2
Expand Down
50 changes: 26 additions & 24 deletions .github/workflows/events-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ jobs:
strategy:
fail-fast: false
matrix:
python_version: [3.6, 3.7]
xla_version: [1.6, 1.7] # todo: , "nightly"
python_version: [3.7]
xla_version: [1.6, 1.7, 1.8] # todo: , "nightly"
steps:
- name: Checkout
uses: actions/checkout@v2
Expand Down Expand Up @@ -127,25 +127,27 @@ jobs:
tags: pytorchlightning/pytorch_lightning:base-conda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
timeout-minutes: 55

# docker-nvidia:
# runs-on: ubuntu-20.04
# steps:
# - name: Checkout
# uses: actions/checkout@v2
#
# # https://github.com/docker/setup-buildx-action
# # Set up Docker Buildx - to use cache-from and cache-to argument of buildx command
# - uses: docker/setup-buildx-action@v1
# - name: Login to DockerHub
# uses: docker/login-action@v1
# with:
# username: ${{ secrets.DOCKER_USERNAME }}
# password: ${{ secrets.DOCKER_PASSWORD }}
#
# - name: Publish NVIDIA to Docker Hub
# uses: docker/build-push-action@v2
# with:
# file: dockers/nvidia/Dockerfile
# push: true
# tags: nvcr.io/pytorchlightning/pytorch_lightning:nvidia
# timeout-minutes: 55
docker-nvidia:
runs-on: ubuntu-20.04
# todo: temporarily skip as the base container does not fit to agent
if: false
steps:
- name: Checkout
uses: actions/checkout@v2

# https://github.com/docker/setup-buildx-action
# Set up Docker Buildx - to use cache-from and cache-to argument of buildx command
- uses: docker/setup-buildx-action@v1
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}

- name: Publish NVIDIA to Docker Hub
uses: docker/build-push-action@v2
with:
file: dockers/nvidia/Dockerfile
push: true
tags: nvcr.io/pytorchlightning/pytorch_lightning:nvidia
timeout-minutes: 55
2 changes: 1 addition & 1 deletion .github/workflows/release-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
push:
branches: [master, "release/*"]
release:
types: [created]
types: [created, published]

jobs:
cuda-PL:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on: # Trigger the workflow on push or pull request, but only for the master bra
push:
branches: [master, "release/*"]
release:
types: [created]
types: [created, published]


jobs:
Expand Down
13 changes: 12 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,9 +151,15 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Improved verbose logging for `EarlyStopping` callback ([#6811](https://github.com/PyTorchLightning/pytorch-lightning/pull/6811))


- Fix yaml loading with PyYAML=5.4.x ([#6666](https://github.com/PyTorchLightning/pytorch-lightning/issues/6666))


### Changed


- Changed `LightningModule.truncated_bptt_steps` to be property ([#7323](https://github.com/PyTorchLightning/pytorch-lightning/pull/7323))


- Changed `EarlyStopping` callback from by default running `EarlyStopping.on_validation_end` if only training is run. Set `check_on_train_epoch_end` to run the callback at the end of the train epoch instead of at the end of the validation epoch ([#7069](https://github.com/PyTorchLightning/pytorch-lightning/pull/7069))


Expand Down Expand Up @@ -201,10 +207,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Deprecated


- Deprecated `outputs` in both `LightningModule.on_train_epoch_end` and `Callback.on_train_epoch_end` hooks ([#7339](https://github.com/PyTorchLightning/pytorch-lightning/pull/7339))


- Deprecated `Trainer.truncated_bptt_steps` in favor of `LightningModule.truncated_bptt_steps` ([#7323](https://github.com/PyTorchLightning/pytorch-lightning/pull/7323))


- Deprecated `LightningModule.grad_norm` in favor of `pytorch_lightning.utilities.grads.grad_norm` ([#7292](https://github.com/PyTorchLightning/pytorch-lightning/pull/7292))


Expand Down Expand Up @@ -295,6 +303,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Removed `mode='auto'` from `EarlyStopping` ([#6167](https://github.com/PyTorchLightning/pytorch-lightning/pull/6167))


- Removed `epoch` and `step` arguments from `ModelCheckpoint.format_checkpoint_name()`, these are now included in the `metrics` argument ([#7344](https://github.com/PyTorchLightning/pytorch-lightning/pull/7344))


- Removed legacy references for magic keys in the `Result` object ([#6016](https://github.com/PyTorchLightning/pytorch-lightning/pull/6016))


Expand Down
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Lightning forces the following structure to your code which makes it reusable an
- Research code (the LightningModule).
- Engineering code (you delete, and is handled by the Trainer).
- Non-essential research code (logging, etc... this goes in Callbacks).
- Data (use PyTorch Dataloaders or organize them into a LightningDataModule).
- Data (use PyTorch DataLoaders or organize them into a LightningDataModule).

Once you do this, you can train on multiple-GPUs, TPUs, CPUs and even in 16-bit precision without changing your code!

Expand All @@ -67,25 +67,25 @@ Get started with our [2 step guide](https://pytorch-lightning.readthedocs.io/en/
---

## Continuous Integration
Lightning is rigurously tested across multiple GPUs, TPUs CPUs and against major Python and PyTorch versions.
Lightning is rigorously tested across multiple GPUs, TPUs CPUs and against major Python and PyTorch versions.

<details>
<summary>Current build statuses</summary>

<center>

| System / PyTorch ver. | 1.4 (min. req.)* | 1.5 | 1.6 | 1.7 (latest) | 1.8 (nightly) |
| System / PyTorch ver. | 1.4 (min. req.) | 1.5 | 1.6 | 1.7 | 1.8 (latest) |
| :---: | :---: | :---: | :---: | :---: | :---: |
| Conda py3.7 [linux] | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg?branch=master&event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg?branch=master&event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg?branch=master&event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg?branch=master&event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg?branch=master&event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) |
| Linux py3.7 [GPUs**] | - | - | [![Build Status](https://dev.azure.com/PytorchLightning/pytorch-lightning/_apis/build/status/PyTorchLightning.pytorch-lightning?branchName=master)](https://dev.azure.com/PytorchLightning/pytorch-lightning/_build/latest?definitionId=2&branchName=master) | - | - |
| Linux py3.7 [GPUs**] | - | - | [![Build Status](https://dev.azure.com/PytorchLightning/pytorch-lightning/_apis/build/status/PL.pytorch-lightning%20(GPUs)?branchName=master)](https://dev.azure.com/PytorchLightning/pytorch-lightning/_build/latest?definitionId=6&branchName=master) | - | - |
| Linux py3.{6,7} [TPUs***] | - | - | [![TPU tests](https://github.com/PyTorchLightning/pytorch-lightning/workflows/TPU%20tests/badge.svg?branch=master&event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22TPU+tests%22+branch%3Amaster) | [![TPU tests](https://github.com/PyTorchLightning/pytorch-lightning/workflows/TPU%20tests/badge.svg?branch=master&event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22TPU+tests%22+branch%3Amaster) |
| Linux py3.{6,7,8,9} | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?branch=master&event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - | - | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?branch=master&event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - |
| OSX py3.{6,7,8,9} | - | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?branch=master&event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?branch=master&event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - |
| Windows py3.{6,7,8,9} | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?branch=master&event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - | - | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?branch=master&event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - |

- _\** tests run on two NVIDIA K80_
- _\** tests run on two NVIDIA P100_
- _\*** tests run on Google GKE TPUv2/3_
- _TPU w/ py3.6/py3.7 means we support Colab and Kaggle env._
- _TPU py3.7 means we support Colab and Kaggle env._

</center>
</details>
Expand Down Expand Up @@ -387,18 +387,18 @@ If you have any questions please:
## Grid AI
Grid AI is our platform for training models at scale on the cloud!
**Sign up [here](https://www.grid.ai/)**
**Sign up for our FREE community Tier [here](https://www.grid.ai/pricing/)**
To use grid, take your regular command:
```
python my_model.py --learning_rate 1e-6 --layers 2 --gpus 4
python my_model.py --learning_rate 1e-6 --layers 2 --gpus 4
```
And change it to use the grid train command:
```
grid train --grid_gpus 4 my_model.py --learning_rate 'uniform(1e-6, 1e-1, 20)' --layers '[2, 4, 8, 16]'
grid train --grid_gpus 4 my_model.py --learning_rate 'uniform(1e-6, 1e-1, 20)' --layers '[2, 4, 8, 16]'
```
The above command will launch (20 * 4) experiments each running on 4 GPUs (320 GPUs!) - by making ZERO changes to
Expand All @@ -408,11 +408,11 @@ your code.
## Licence
Please observe the Apache 2.0 license that is listed in this repository. In addition
the Lightning framework is Patent Pending.
Please observe the Apache 2.0 license that is listed in this repository.
In addition, the Lightning framework is Patent Pending.
## BibTeX
If you want to cite the framework feel free to use this (but only if you loved it 😊) or [zendo](https://zenodo.org/record/3828935#.YC45Lc9Khqs):
If you want to cite the framework feel free to use this (but only if you loved it 😊) or [zenodo](https://zenodo.org/record/3828935#.YC45Lc9Khqs):
```bibtex
@article{falcon2019pytorch,
Expand Down
27 changes: 19 additions & 8 deletions docs/source/advanced/sequences.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,20 +40,31 @@ For example, it may save memory to use Truncated Backpropagation Through Time wh

Lightning can handle TBTT automatically via this flag.

.. testcode::
.. testcode:: python

from pytorch_lightning import LightningModule

# DEFAULT (single backwards pass per batch)
trainer = Trainer(truncated_bptt_steps=None)
class MyModel(LightningModule):

# (split batch into sequences of size 2)
trainer = Trainer(truncated_bptt_steps=2)
def __init__(self):
super().__init__()
# Important: This property activates truncated backpropagation through time
# Setting this value to 2 splits the batch into sequences of size 2
self.truncated_bptt_steps = 2

# Truncated back-propagation through time
def training_step(self, batch, batch_idx, hiddens):
# the training step must be updated to accept a ``hiddens`` argument
# hiddens are the hiddens from the previous truncated backprop step
out, hiddens = self.lstm(data, hiddens)
return {
"loss": ...,
"hiddens": hiddens
}

.. note:: If you need to modify how the batch is split,
override :meth:`pytorch_lightning.core.LightningModule.tbptt_split_batch`.

.. note:: Using this feature requires updating your LightningModule's
:meth:`pytorch_lightning.core.LightningModule.training_step` to include a `hiddens` arg.

----------

Iterable Datasets
Expand Down
57 changes: 57 additions & 0 deletions docs/source/common/lightning_module.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1005,6 +1005,63 @@ Get the model file size (in megabytes) using ``self.model_size`` inside Lightnin

--------------

truncated_bptt_steps
^^^^^^^^^^^^^^^^^^^^

Truncated back prop breaks performs backprop every k steps of
a much longer sequence.

If this is enabled, your batches will automatically get truncated
and the trainer will apply Truncated Backprop to it.

(`Williams et al. "An efficient gradient-based algorithm for on-line training of
recurrent network trajectories."
<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.7941&rep=rep1&type=pdf>`_)

`Tutorial <https://d2l.ai/chapter_recurrent-neural-networks/bptt.html>`_

.. testcode:: python

from pytorch_lightning import LightningModule

class MyModel(LightningModule):

def __init__(self):
super().__init__()
# Important: This property activates truncated backpropagation through time
# Setting this value to 2 splits the batch into sequences of size 2
self.truncated_bptt_steps = 2

# Truncated back-propagation through time
def training_step(self, batch, batch_idx, hiddens):
# the training step must be updated to accept a ``hiddens`` argument
# hiddens are the hiddens from the previous truncated backprop step
out, hiddens = self.lstm(data, hiddens)
return {
"loss": ...,
"hiddens": hiddens
}

Lightning takes care to split your batch along the time-dimension.

.. code-block:: python
# we use the second as the time dimension
# (batch, time, ...)
sub_batch = batch[0, 0:t, ...]
To modify how the batch is split,
override :meth:`pytorch_lightning.core.LightningModule.tbptt_split_batch`:

.. testcode:: python

class LitMNIST(LightningModule):
def tbptt_split_batch(self, batch, split_size):
# do your own splitting on the batch
return splits

--------------

Hooks
^^^^^
This is the pseudocode to describe how all the hooks are called during a call to ``.fit()``.
Expand Down
3 changes: 2 additions & 1 deletion legacy/generate_checkpoints.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# bash generate_checkpoints.sh 1.0.2 1.0.3 1.0.4

LEGACY_PATH="$( cd "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )"
FROZEN_MIN_PT_VERSION="1.4"

echo $LEGACY_PATH
# install some PT version here so it does not need to reinstalled for each env
Expand All @@ -22,7 +23,7 @@ do
# activate and install PL version
source "$ENV_PATH/bin/activate"
# there are problem to load ckpt in older versions since they are saved the newer versions
pip install "pytorch_lightning==$ver" "torch==1.3" --quiet --no-cache-dir
pip install "pytorch_lightning==$ver" "torch==$FROZEN_MIN_PT_VERSION" --quiet --no-cache-dir

python --version
pip --version
Expand Down
3 changes: 1 addition & 2 deletions pytorch_lightning/accelerators/accelerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,8 +196,7 @@ def training_step(
- batch_idx (int): Integer displaying index of this batch
- optimizer_idx (int): When using multiple optimizers, this argument will also be present.
- hiddens(:class:`~torch.Tensor`): Passed in if
:paramref:`~pytorch_lightning.trainer.trainer.Trainer.truncated_bptt_steps` > 0.
:paramref:`~pytorch_lightning.core.lightning.LightningModule.truncated_bptt_steps` > 0.
"""
args[0] = self.to_device(args[0])

Expand Down
Loading

0 comments on commit c84577d

Please sign in to comment.