Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: Accelerator refactor [wip] [skip ci] #5616

Closed
wants to merge 168 commits into from
Closed
Show file tree
Hide file tree
Changes from 125 commits
Commits
Show all changes
168 commits
Select commit Hold shift + click to select a range
fddeee3
move to old package
justusschock Nov 9, 2020
f9c1e8d
add initial draft of new accelerators
justusschock Nov 9, 2020
28ae403
add initial data parallel draft
justusschock Nov 9, 2020
fe7573f
add initial precision draft
justusschock Nov 9, 2020
9fd48a1
scheduler helper functions
justusschock Nov 9, 2020
b961aaf
define base plugin api
justusschock Nov 11, 2020
532ad5d
base plugin integration
justusschock Nov 11, 2020
f52ad64
continue ddp plugin
justusschock Nov 11, 2020
bcfb4e7
minor changes precision plugin
justusschock Nov 11, 2020
bf8a87a
start ddp plugin
justusschock Nov 11, 2020
8482c0b
initail version ddp spawn
justusschock Nov 12, 2020
12d2c59
remove deprecated implementation
justusschock Nov 12, 2020
8d83db8
add comment on whats missing
justusschock Nov 12, 2020
22e1e31
latest state
justusschock Nov 20, 2020
eac87c3
update accelerator for model to live in traintype plugin
justusschock Nov 30, 2020
d111471
add general plugin interface
justusschock Nov 30, 2020
3d6c4b8
add model properties
justusschock Nov 30, 2020
51740e9
Trainer integration part 1 for CPU accelerator
awaelchli Dec 4, 2020
9e48568
test single gpu trainer integration
awaelchli Dec 6, 2020
5da773a
make device changes a bit less hardcoded
justusschock Dec 7, 2020
42e53be
properly resolve attributes
justusschock Dec 7, 2020
4c8d24f
add properties for accelerator forwarding
justusschock Dec 7, 2020
6faebfa
correct optimizer_step calls
justusschock Dec 7, 2020
29568e1
call train or test
awaelchli Dec 7, 2020
33561d7
make calls to trainstep (ad fix bugs)
justusschock Dec 7, 2020
ef94755
remove gradient_clip_val from accelerator
awaelchli Dec 7, 2020
c5e9892
add back the step end methods
awaelchli Dec 7, 2020
c02baad
add precision todo comment
awaelchli Dec 7, 2020
ce4eafa
ddp
awaelchli Dec 8, 2020
e6ba009
clean up
awaelchli Dec 8, 2020
fa4d844
connect
awaelchli Dec 8, 2020
8be82a4
clean up
awaelchli Dec 8, 2020
08ce7d3
post
awaelchli Dec 8, 2020
ffbcd4f
disable progress bar on rank > 0
awaelchli Dec 9, 2020
4be76bf
precision test
justusschock Dec 10, 2020
098f665
fix native amp
justusschock Dec 10, 2020
ea85633
a
awaelchli Dec 12, 2020
846dc92
ddp spawn
awaelchli Dec 12, 2020
0d0c3d7
spawn
awaelchli Dec 12, 2020
3fb8b4d
finish ddp plugin integration
awaelchli Dec 13, 2020
0f5298e
remove logger from plugins
awaelchli Dec 13, 2020
434e30e
setup
awaelchli Dec 13, 2020
3fb31c8
remove logger arg
awaelchli Dec 13, 2020
e7a7a87
module
awaelchli Dec 13, 2020
1e8aa44
clean up
awaelchli Dec 13, 2020
628fdc3
ddp_cpu integration
awaelchli Dec 14, 2020
9f369cc
cuda context manager for emptying cache
awaelchli Dec 14, 2020
a8e8306
args
awaelchli Dec 14, 2020
71cbd33
move "log_gpu_memory" to logger connector
awaelchli Dec 14, 2020
1a9ad4f
fix imports
justusschock Dec 14, 2020
7b874cc
typo
justusschock Dec 14, 2020
bc2460a
remove todo
justusschock Dec 14, 2020
506c446
add rpc_enabled flag
justusschock Dec 14, 2020
19d19d5
remove unused self arg
justusschock Dec 14, 2020
dd4d148
comment out unnexessary amp part
justusschock Dec 14, 2020
f2fffc6
fix model connector
justusschock Dec 14, 2020
c6b3aeb
fix import
justusschock Dec 14, 2020
55fc952
copy properties only once
justusschock Dec 14, 2020
177a634
add cluster env
awaelchli Dec 22, 2020
7290e99
move slurm configuration
awaelchli Dec 22, 2020
1b9c095
resolve importerrors
awaelchli Dec 22, 2020
e50aea9
handle distributed_sampler_kwargs
awaelchli Dec 22, 2020
2e8f944
move emptying cache to accelertor
awaelchli Dec 22, 2020
bcc7a72
fix a few tests
awaelchli Dec 22, 2020
259c7f7
restoring the result from subprocess
awaelchli Dec 22, 2020
dfab52a
fix queue.get() order for results
awaelchli Dec 22, 2020
6742488
add missing "block_backward_sync" context manager
awaelchli Dec 22, 2020
8c89932
add missing "block_backward_sync" context manager
awaelchli Dec 22, 2020
0186a0f
fix sync_batchnorm
awaelchli Dec 22, 2020
b2ac1f4
fix supported gpu-ids for tuple
awaelchli Dec 22, 2020
07a41ce
fix clip gradients and inf recursion
awaelchli Dec 22, 2020
63b7eaf
accelerator selection: added cluster_environment plugin
awaelchli Dec 23, 2020
f8344c5
fix torchelastic test
awaelchli Dec 23, 2020
34e3c15
fix reduce early stopping decision for DDP
awaelchli Dec 24, 2020
27a4cff
fix tests: callbacks, conversion to lightning optimizer
awaelchli Dec 24, 2020
df5ac30
fix lightning optimizer does not pickle
awaelchli Dec 24, 2020
dcf917a
fix setting benchmark and deterministic option
awaelchli Dec 24, 2020
272f088
fix slurm amp test
awaelchli Dec 24, 2020
4529476
fix prepare_data test and determine node_rank
awaelchli Dec 27, 2020
5319b0f
fix retrieving last path when testing
awaelchli Dec 27, 2020
3b54cfb
remove obsolete plugin argument
awaelchli Dec 27, 2020
6540b87
fix test: test_trainer_config
awaelchli Dec 27, 2020
6b450e1
fix torchscript tests
awaelchli Dec 27, 2020
4ef539f
fix trainer.model access
awaelchli Dec 27, 2020
1001ccf
move properties
awaelchli Dec 27, 2020
38a1d0f
fix test_transfer_batch_hook
awaelchli Dec 27, 2020
46cf7ef
fix auto_select_gpus
awaelchli Dec 27, 2020
258f50e
fix omegaconf test
awaelchli Dec 27, 2020
a5d69b9
fix test that needs to simulate slurm ddp
awaelchli Dec 27, 2020
88a7ed5
add horovod plugin
awaelchli Dec 29, 2020
40daa41
fix test with named arguments
awaelchli Dec 29, 2020
96fc074
clean up whitespace
awaelchli Dec 29, 2020
210831a
fix datamodules test
awaelchli Dec 29, 2020
98b6dd4
remove old accelerators
justusschock Jan 6, 2021
dfcbba6
fix naming
justusschock Jan 6, 2021
348a1b0
move old plugins
justusschock Jan 6, 2021
14f2f6e
move to plugins
justusschock Jan 6, 2021
2f779c6
create precision subpackage
justusschock Jan 6, 2021
58536f6
create training_type subpackage
justusschock Jan 6, 2021
ee53c90
fix all new import errors
awaelchli Jan 7, 2021
894e604
fix wrong arguments order passed to test
awaelchli Jan 7, 2021
2bdc836
fix LR finder
awaelchli Jan 10, 2021
48b9882
Added sharded training type and amp plugin
Jan 11, 2021
38452b6
Move clip grad to precision plugin
Jan 11, 2021
173b22c
Added sharded spawn, select accelerators based on distributed_backend…
Jan 12, 2021
79803f6
Fix import issue, attempting to fix tests
Jan 12, 2021
a7c0d8f
Fix initial test
Jan 12, 2021
02df0ad
Reflect hook logic from master, should wrap model after move to device
Jan 14, 2021
d0ebcba
Optional state consolidation, since master has optimizers not wrapped
justusschock Jan 22, 2021
319c3e8
change attribute for instance test
justusschock Jan 22, 2021
a34cd15
reset optimizers
justusschock Jan 22, 2021
c95b06a
legacy
Borda Jan 22, 2021
9ff0c64
imports in accel
Borda Jan 22, 2021
67d4e47
legacy2
Borda Jan 22, 2021
577b00d
trainer imports
Borda Jan 22, 2021
aa4858b
fix import errors after rebase
awaelchli Jan 25, 2021
f81a44f
move hook to new setup location
awaelchli Jan 25, 2021
a285665
provide unwrapping logic
awaelchli Jan 25, 2021
bf78d70
fix trainer callback system
awaelchli Jan 25, 2021
34947cf
added ddp2 implementation
awaelchli Jan 25, 2021
49bec53
fix imports .legacy
Borda Jan 25, 2021
ba1c986
move plugins
Borda Jan 25, 2021
45dfbb7
restore legacy
Borda Jan 25, 2021
9b7326a
drop test.py from root
Borda Jan 25, 2021
96bc05d
add tpu accelerator and plugins
justusschock Jan 26, 2021
c5994e5
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Jan 30, 2021
9e46624
fixes
awaelchli Jan 30, 2021
22d2ae8
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Jan 30, 2021
901d392
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Jan 31, 2021
e174b8d
fix lightning optimizer merge
awaelchli Jan 31, 2021
98660de
reset bugreportmodel
awaelchli Jan 31, 2021
4d95b6c
unwrapping
awaelchli Jan 31, 2021
b69d013
step routing forward
awaelchli Jan 31, 2021
cb6676d
model access
awaelchli Jan 31, 2021
a33d27f
unwrap
awaelchli Jan 31, 2021
f7486e2
opt
awaelchli Jan 31, 2021
117f16d
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Jan 31, 2021
3792b72
integrate distrib_type
awaelchli Jan 31, 2021
ef85b81
sync changes
awaelchli Jan 31, 2021
9d9a940
sync
awaelchli Feb 1, 2021
f017a39
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Feb 1, 2021
a190a56
fixes
awaelchli Feb 1, 2021
73bb607
add forgotten generators
awaelchli Feb 1, 2021
c8c74f3
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Feb 1, 2021
ae71997
add missing logic
awaelchli Feb 1, 2021
d89847b
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Feb 1, 2021
0e686c3
update
awaelchli Feb 1, 2021
d6a43ea
import
awaelchli Feb 1, 2021
ceb8f75
missed imports
awaelchli Feb 1, 2021
fbb7c20
import fixes
awaelchli Feb 1, 2021
b610999
isort
awaelchli Feb 1, 2021
9b79924
mv f
awaelchli Feb 1, 2021
9afe54d
changelog
awaelchli Feb 1, 2021
3b63e82
Merge branch 'release/1.2-dev' into ref/update-plugins
awaelchli Feb 1, 2021
ca8cb68
format
awaelchli Feb 1, 2021
0633745
move helper to parallel plugin
awaelchli Feb 1, 2021
a622e0b
d
awaelchli Feb 1, 2021
18c682f
Merge branch 'ref/update-plugins' into accelerator-refactor-sharted-4
awaelchli Feb 1, 2021
f275803
add world size
awaelchli Feb 1, 2021
4ae008b
clean up
awaelchli Feb 1, 2021
3b3918b
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Feb 1, 2021
d4c6308
duplicate
awaelchli Feb 1, 2021
7eef4a0
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Feb 2, 2021
9949164
activate ddp_sharded and tpu
awaelchli Feb 2, 2021
6d47357
set nvidia flags
awaelchli Feb 2, 2021
a6864ec
remove unused colab var
awaelchli Feb 2, 2021
b4b9724
use_tpu <-> on_tpu attrs
awaelchli Feb 2, 2021
81001e3
make some ddp_cpu and clusterplugin tests pass
awaelchli Feb 2, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 13 additions & 38 deletions benchmarks/test_sharded_parity.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,12 @@
import os
import platform
import time
from typing import Type, Union
from typing import Type

import pytest
import torch

from pytorch_lightning import seed_everything, Trainer
from pytorch_lightning.plugins.ddp_plugin import DDPPlugin
from pytorch_lightning.plugins.sharded_plugin import DDPShardedPlugin
from pytorch_lightning.utilities import _FAIRSCALE_AVAILABLE, _NATIVE_AMP_AVAILABLE
from tests.backends import DDPLauncher
from tests.base.boring_model import BoringModel, RandomDataset
Expand All @@ -32,10 +30,8 @@
@pytest.mark.skipif(platform.system() == "Windows", reason="Distributed training is not supported on Windows")
@pytest.mark.skipif(not _FAIRSCALE_AVAILABLE, reason="Fairscale is not available")
def test_ddp_sharded_plugin_correctness_one_gpu():
plugin_parity_test(
sharded_parity_test(
gpus=1,
accelerator='ddp_spawn',
plugin=DDPShardedPlugin(),
model_cls=SeedTrainLoaderModel,
)

Expand All @@ -45,11 +41,9 @@ def test_ddp_sharded_plugin_correctness_one_gpu():
@pytest.mark.skipif(platform.system() == "Windows", reason="Distributed training is not supported on Windows")
@pytest.mark.skipif(not _FAIRSCALE_AVAILABLE, reason="Fairscale is not available")
def test_ddp_sharded_plugin_correctness_amp_one_gpu():
plugin_parity_test(
sharded_parity_test(
gpus=1,
precision=16,
accelerator='ddp_spawn',
plugin=DDPShardedPlugin(),
model_cls=SeedTrainLoaderModel,
)

Expand All @@ -59,10 +53,8 @@ def test_ddp_sharded_plugin_correctness_amp_one_gpu():
@pytest.mark.skipif(platform.system() == "Windows", reason="Distributed training is not supported on Windows")
@pytest.mark.skipif(not _FAIRSCALE_AVAILABLE, reason="Fairscale is not available")
def test_ddp_sharded_plugin_correctness_multi_gpu():
plugin_parity_test(
sharded_parity_test(
gpus=2,
accelerator='ddp_spawn',
plugin=DDPShardedPlugin(),
model_cls=SeedTrainLoaderModel,
max_percent_speed_diff=0.25, # todo: Increase speed diff since only 2 GPUs sharding 2 optimizers
)
Expand All @@ -73,11 +65,9 @@ def test_ddp_sharded_plugin_correctness_multi_gpu():
@pytest.mark.skipif(torch.cuda.device_count() < 2, reason="test requires multi-GPU machine")
@pytest.mark.skipif(not _FAIRSCALE_AVAILABLE, reason="Fairscale is not available")
def test_ddp_sharded_plugin_correctness_amp_multi_gpu():
plugin_parity_test(
sharded_parity_test(
gpus=2,
precision=16,
accelerator='ddp_spawn',
plugin=DDPShardedPlugin(),
model_cls=SeedTrainLoaderModel,
max_percent_speed_diff=0.25, # todo: Increase speed diff since only 2 GPUs sharding 2 optimizers
)
Expand All @@ -88,11 +78,9 @@ def test_ddp_sharded_plugin_correctness_amp_multi_gpu():
@pytest.mark.skipif(torch.cuda.device_count() < 2, reason="test requires multi-GPU machine")
@pytest.mark.skipif(not _FAIRSCALE_AVAILABLE, reason="Fairscale is not available")
def test_ddp_string_sharded_plugin_correctness_amp_multi_gpu():
plugin_parity_test(
sharded_parity_test(
gpus=2,
precision=16,
accelerator='ddp_spawn',
plugin='ddp_sharded',
model_cls=SeedTrainLoaderModel,
max_percent_speed_diff=0.25, # todo: Increase speed diff since only 2 GPUs sharding 2 optimizers
)
Expand All @@ -104,11 +92,9 @@ def test_ddp_string_sharded_plugin_correctness_amp_multi_gpu():
reason="test should be run outside of pytest")
@DDPLauncher.run("--accelerator ddp --gpus 2 --precision 32")
def test_ddp_sharded_plugin_correctness_multi_gpu_ddp(tmpdir, args=None):
plugin_parity_test(
sharded_parity_test(
gpus=args.gpus,
precision=args.precision,
accelerator=args.accelerator,
plugin=DDPShardedPlugin(),
model_cls=SeedTrainLoaderModel,
)

Expand All @@ -119,11 +105,9 @@ def test_ddp_sharded_plugin_correctness_multi_gpu_ddp(tmpdir, args=None):
reason="test should be run outside of pytest")
@DDPLauncher.run("--accelerator ddp --gpus 2 --precision 16")
def test_ddp_sharded_plugin_correctness_amp_multi_gpu_ddp(tmpdir, args=None):
plugin_parity_test(
sharded_parity_test(
gpus=args.gpus,
precision=args.precision,
accelerator=args.accelerator,
plugin=DDPShardedPlugin(),
model_cls=SeedTrainLoaderModel,
)

Expand All @@ -136,10 +120,8 @@ def test_ddp_sharded_plugin_correctness_multi_gpu_multi_optim():
"""
Ensures same results using multiple optimizers across multiple GPUs
"""
plugin_parity_test(
plugin=DDPShardedPlugin(),
sharded_parity_test(
gpus=2,
accelerator='ddp_spawn',
model_cls=SeedTrainLoaderMultipleOptimizersModel,
max_percent_speed_diff=0.25, # todo: Increase speed diff since only 2 GPUs sharding 2 optimizers
)
Expand All @@ -153,10 +135,8 @@ def test_ddp_sharded_plugin_correctness_multi_gpu_multi_optim_manual(tmpdir):
"""
Ensures using multiple optimizers across multiple GPUs with manual optimization
"""
plugin_parity_test(
plugin=DDPShardedPlugin(),
sharded_parity_test(
gpus=2,
accelerator='ddp_spawn',
model_cls=SeedTrainLoaderManualModel,
max_percent_speed_diff=0.25, # todo: Increase speed diff since only 2 GPUs sharding 2 optimizers
)
Expand Down Expand Up @@ -253,11 +233,9 @@ def record_ddp_fit_model_stats(trainer, model, use_cuda):
return max_memory, total_time


def plugin_parity_test(
def sharded_parity_test(
model_cls: Type[SeedTrainLoaderModel],
plugin: Union[str, DDPPlugin],
seed: int = 42,
accelerator: str = 'ddp_spawn',
gpus: int = 0,
precision: int = 32,
max_percent_speed_diff: float = 0.1,
Expand All @@ -268,9 +246,7 @@ def plugin_parity_test(

Args:
model_cls: Model class to use for test.
plugin: Plugin to parity test.
seed: Seed for generators. Note that this does not handle the seed for data-loading on multi-process.
accelerator: Accelerator type for test.
gpus: Number of GPUS to enable.
precision: Whether to use AMP or normal FP32 training.
max_percent_speed_diff: The maximum speed difference compared to normal DDP training.
Expand All @@ -288,7 +264,7 @@ def plugin_parity_test(
max_epochs=1,
gpus=gpus,
precision=precision,
accelerator=accelerator,
accelerator='ddp_spawn',
)

max_memory_ddp, ddp_time = record_ddp_fit_model_stats(
Expand All @@ -306,8 +282,7 @@ def plugin_parity_test(
max_epochs=1,
gpus=gpus,
precision=precision,
accelerator=accelerator,
plugins=[plugin],
accelerator='ddp_sharded_spawn',
)

max_memory_custom, custom_model_time = record_ddp_fit_model_stats(
Expand Down
2 changes: 1 addition & 1 deletion pl_examples/basic_examples/conv_sequential_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
from pl_examples import cli_lightning_logo
from pytorch_lightning import Trainer
from pytorch_lightning.metrics.functional import accuracy
from pytorch_lightning.plugins.ddp_sequential_plugin import DDPSequentialPlugin
from pytorch_lightning.plugins.legacy.ddp_sequential_plugin import DDPSequentialPlugin
from pytorch_lightning.utilities import _BOLTS_AVAILABLE, _FAIRSCALE_PIPE_AVAILABLE

if _BOLTS_AVAILABLE:
Expand Down
11 changes: 5 additions & 6 deletions pl_examples/bug_report_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,24 +55,23 @@ class BoringModel(LightningModule):
def __init__(self):
"""
Testing PL Module

Use as follows:
- subclass
- modify the behavior for what you want

class TestModel(BaseTestModel):
def training_step(...):
# do your own thing

or:

model = BaseTestModel()
model.training_epoch_end = None

"""
super().__init__()
self.layer = torch.nn.Linear(32, 2)

@property
def automatic_optimization(self):
return True

def forward(self, x):
return self.layer(x)

Expand All @@ -81,7 +80,7 @@ def loss(self, batch, prediction):
return torch.nn.functional.mse_loss(prediction, torch.ones_like(prediction))

def step(self, x):
x = self.layer(x)
x = self(x)
out = torch.nn.functional.mse_loss(x, torch.ones_like(x))
return out

Expand Down
29 changes: 4 additions & 25 deletions pytorch_lightning/accelerators/__init__.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,4 @@
# Copyright The PyTorch Lightning team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from pytorch_lightning.accelerators.accelerator import Accelerator # noqa: F401
from pytorch_lightning.accelerators.cpu_accelerator import CPUAccelerator # noqa: F401
from pytorch_lightning.accelerators.ddp2_accelerator import DDP2Accelerator # noqa: F401
from pytorch_lightning.accelerators.ddp_accelerator import DDPAccelerator # noqa: F401
from pytorch_lightning.accelerators.ddp_cpu_hpc_accelerator import DDPCPUHPCAccelerator # noqa: F401
from pytorch_lightning.accelerators.ddp_cpu_spawn_accelerator import DDPCPUSpawnAccelerator # noqa: F401
from pytorch_lightning.accelerators.ddp_hpc_accelerator import DDPHPCAccelerator # noqa: F401
from pytorch_lightning.accelerators.ddp_spawn_accelerator import DDPSpawnAccelerator # noqa: F401
from pytorch_lightning.accelerators.dp_accelerator import DataParallelAccelerator # noqa: F401
from pytorch_lightning.accelerators.gpu_accelerator import GPUAccelerator # noqa: F401
from pytorch_lightning.accelerators.horovod_accelerator import HorovodAccelerator # noqa: F401
from pytorch_lightning.accelerators.tpu_accelerator import TPUAccelerator # noqa: F401
from pytorch_lightning.accelerators.accelerator import Accelerator
from pytorch_lightning.accelerators.cpu import CPUAccelerator
from pytorch_lightning.accelerators.gpu import GPUAccelerator
from pytorch_lightning.accelerators.tpu import TPUAccelerator
Loading