Skip to content

Commit

Permalink
PyTorch 1.7 Stable support (#3821)
Browse files Browse the repository at this point in the history
* prepare for 1.7 support [ci skip]

* tpu [ci skip]

* test run 1.7

* all 1.7, needs to fix tests

* couple with torchvision

* windows try

* remove windows

* 1.7 is here

* on purpose fail [ci skip]

* return [ci skip]

* 1.7 docker

* back to normal [ci skip]

* change to some_val [ci skip]

* add seed [ci skip]

* 4 places [ci skip]

* fail on purpose [ci skip]

* verbose=True [ci skip]

* use filename to track

* use filename to track

* monitor epoch + changelog

* Update tests/checkpointing/test_model_checkpoint.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
(cherry picked from commit 0f584fa)
  • Loading branch information
Jeff Yang authored and Borda committed Nov 4, 2020
1 parent c2809d3 commit c31a109
Show file tree
Hide file tree
Showing 11 changed files with 33 additions and 23 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/ci_dockers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
fail-fast: false
matrix:
python_version: [3.7]
xla_version: [1.6] # todo: , "nightly"
xla_version: [1.6, "nightly"]
steps:
- name: Checkout
uses: actions/checkout@v2
Expand All @@ -66,8 +66,8 @@ jobs:
fail-fast: false
matrix:
include:
#- python_version: 3.8
# pytorch_version: 1.7 # todo
- python_version: 3.8
pytorch_version: 1.7
- python_version: 3.7
pytorch_version: 1.6
- python_version: 3.6
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci_test-conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
matrix:
# os: [ubuntu-20.04]
python-version: [3.7]
pytorch-version: [1.3, 1.4, 1.5, 1.6] # , 1.7 # todo
pytorch-version: [1.3, 1.4, 1.5, 1.6, 1.7]

# Timeout: https://stackoverflow.com/a/59076067/4521646
timeout-minutes: 35
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci_test-full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ jobs:
run: |
# python -m pip install --upgrade --user pip
pip install --requirement requirements.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --quiet --upgrade
pip install --requirement ./requirements/devel.txt --quiet --upgrade
pip install --requirement ./requirements/devel.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --quiet --upgrade
python --version
pip --version
pip list
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ jobs:
fail-fast: false
matrix:
python_version: [3.6, 3.7, 3.8]
pytorch_version: [1.3, 1.4, 1.5, 1.6] # todo: , 1.7
pytorch_version: [1.3, 1.4, 1.5, 1.6, 1.7]
exclude:
# excludes PT 1.3 as it is missing on pypi
- python_version: 3.8
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Added

- Added PyTorch 1.7 Stable support ([#3821](https://github.com/PyTorchLightning/pytorch-lightning/pull/3821))

- Added "monitor" key to saved `ModelCheckpoints` ([#4383](https://github.com/PyTorchLightning/pytorch-lightning/pull/4383))

- Added `ConfusionMatrix` class interface ([#4348](https://github.com/PyTorchLightning/pytorch-lightning/pull/4348))
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,14 +89,14 @@ Lightning can automatically export to ONNX or TorchScript for those cases.
## Continuous Integration
<center>

| System / PyTorch ver. | 1.3 (min. req.)* | 1.4 | 1.5 | 1.6 (latest) | 1.7 (nightly) |
| System / PyTorch ver. | 1.3 (min. req.)* | 1.4 | 1.5 | 1.6 | 1.7 (latest) |
| :---: | :---: | :---: | :---: | :---: | :---: |
| Conda py3.7 [linux] | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) | - |
| Conda py3.7 [linux] | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) | [![PyTorch & Conda](https://github.com/PyTorchLightning/pytorch-lightning/workflows/PyTorch%20&%20Conda/badge.svg)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22PyTorch+%26+Conda%22+branch%3Amaster) |
| Linux py3.7 [GPUs**] | - | - | [![Build Status](http://104.154.220.231/api/badges/PyTorchLightning/pytorch-lightning/status.svg)](http://104.154.220.231/PyTorchLightning/pytorch-lightning) | - | - |
| Linux py3.7 [TPUs***] | - | - | - | [![TPU tests](https://github.com/PyTorchLightning/pytorch-lightning/workflows/TPU%20tests/badge.svg)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22TPU+tests%22+branch%3Amaster) | - |
| Linux py3.6 / py3.7 / py3.8 | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - | - | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - |
| OSX py3.6 / py3.7 | - | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - |
| Windows py3.6 / py3.7 / py3.8 | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - | - | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - |
| Linux py3.6 / py3.7 / py3.8 | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - | - | - | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) |
| OSX py3.6 / py3.7 | - | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - | - | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) |
| Windows py3.6 / py3.7 / py3.8 | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) | - | - | - | [![CI complete testing](https://github.com/PyTorchLightning/pytorch-lightning/workflows/CI%20complete%20testing/badge.svg?event=push)](https://github.com/PyTorchLightning/pytorch-lightning/actions?query=workflow%3A%22CI+testing%22) |

- _\* `torch>=1.4` is the minimal pytorch version for Python 3.8_
- _\** tests run on two NVIDIA K80_
Expand Down
2 changes: 1 addition & 1 deletion dockers/base-xla/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -110,4 +110,4 @@ RUN \
conda info && \
pip list && \
python -c "import sys; assert sys.version[:3] == '$PYTHON_VERSION', sys.version" && \
python -c "import torch; ver = '$XLA_VERSION' ; ver = dict(nightly='1.7').get(ver, ver) ; assert torch.__version__[:3] == ver, torch.__version__"
python -c "import torch; ver = '$XLA_VERSION' ; ver = dict(nightly='1.8').get(ver, ver) ; assert torch.__version__[:3] == ver, torch.__version__"
4 changes: 2 additions & 2 deletions pytorch_lightning/core/grads.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,11 @@ def grad_norm(self, norm_type: Union[float, int, str]) -> Dict[str, float]:
continue

param_norm = float(p.grad.data.norm(norm_type))
norms[f'grad_{norm_type}_norm_{name}'] = round(param_norm, 3)
norms[f'grad_{norm_type}_norm_{name}'] = round(param_norm, 4)

all_norms.append(param_norm)

total_norm = float(torch.tensor(all_norms).norm(norm_type))
norms[f'grad_{norm_type}_norm_total'] = round(total_norm, 3)
norms[f'grad_{norm_type}_norm_total'] = round(total_norm, 4)

return norms
20 changes: 14 additions & 6 deletions tests/checkpointing/test_model_checkpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -365,20 +365,28 @@ def test_model_checkpoint_topk_zero(tmpdir):
def test_model_checkpoint_topk_all(tmpdir):
""" Test that save_top_k=-1 tracks the best models when monitor key is provided. """
seed_everything(1000)
epochs = 2
model = EvalModelTemplate()
checkpoint_callback = ModelCheckpoint(dirpath=tmpdir, monitor="early_stop_on", save_top_k=-1)
epochs = 3

class CustomModel(EvalModelTemplate):
def validation_epoch_end(self, outputs):
return {'epoch': self.current_epoch}

model = CustomModel()
checkpoint_callback = ModelCheckpoint(dirpath=tmpdir, monitor="epoch", mode='max', save_top_k=-1)
trainer = Trainer(
default_root_dir=tmpdir,
checkpoint_callback=checkpoint_callback,
max_epochs=epochs,
logger=False,
)
trainer.fit(model)
assert checkpoint_callback.best_model_path == tmpdir / "epoch=1.ckpt"
assert checkpoint_callback.best_model_score > 0

assert checkpoint_callback.monitor == 'epoch'
assert checkpoint_callback.best_model_path == tmpdir / "epoch=2.ckpt"
assert checkpoint_callback.best_model_score == epochs - 1
assert len(os.listdir(tmpdir)) == len(checkpoint_callback.best_k_models) == epochs
assert set(checkpoint_callback.best_k_models.keys()) == set(str(tmpdir / f"epoch={i}.ckpt") for i in range(epochs))
assert checkpoint_callback.kth_best_model_path == tmpdir / "epoch=0.ckpt"
assert checkpoint_callback.kth_best_model_path == tmpdir / 'epoch=0.ckpt'


def test_ckpt_metric_names(tmpdir):
Expand Down
2 changes: 1 addition & 1 deletion tests/metrics/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def setup_ddp(rank, world_size):
os.environ["MASTER_ADDR"] = 'localhost'
os.environ['MASTER_PORT'] = '8088'

if torch.distributed.is_available():
if torch.distributed.is_available() and sys.platform not in ['win32', 'cygwin']:
torch.distributed.init_process_group("gloo", rank=rank, world_size=world_size)


Expand Down
4 changes: 2 additions & 2 deletions tests/models/test_grad_norm.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,11 @@ def on_after_backward(self):
norm = np.linalg.norm(flat, self.norm_type)
norms.append(norm)

out[prefix + name] = round(norm, 3)
out[prefix + name] = round(norm, 4)

# handle total norm
norm = np.linalg.norm(norms, self.norm_type)
out[prefix + 'total'] = round(norm, 3)
out[prefix + 'total'] = round(norm, 4)
self.stored_grad_norms.append(out)


Expand Down

0 comments on commit c31a109

Please sign in to comment.