Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve manual optimization API #5771

Merged
merged 355 commits into from
Feb 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
355 commits
Select commit Hold shift + click to select a range
79803f6
Fix import issue, attempting to fix tests
Jan 12, 2021
a7c0d8f
Fix initial test
Jan 12, 2021
02df0ad
Reflect hook logic from master, should wrap model after move to device
Jan 14, 2021
d0ebcba
Optional state consolidation, since master has optimizers not wrapped
justusschock Jan 22, 2021
319c3e8
change attribute for instance test
justusschock Jan 22, 2021
a34cd15
reset optimizers
justusschock Jan 22, 2021
c95b06a
legacy
Borda Jan 22, 2021
9ff0c64
imports in accel
Borda Jan 22, 2021
67d4e47
legacy2
Borda Jan 22, 2021
577b00d
trainer imports
Borda Jan 22, 2021
aa4858b
fix import errors after rebase
awaelchli Jan 25, 2021
f81a44f
move hook to new setup location
awaelchli Jan 25, 2021
a285665
provide unwrapping logic
awaelchli Jan 25, 2021
bf78d70
fix trainer callback system
awaelchli Jan 25, 2021
34947cf
added ddp2 implementation
awaelchli Jan 25, 2021
49bec53
fix imports .legacy
Borda Jan 25, 2021
ba1c986
move plugins
Borda Jan 25, 2021
45dfbb7
restore legacy
Borda Jan 25, 2021
9b7326a
drop test.py from root
Borda Jan 25, 2021
96bc05d
add tpu accelerator and plugins
justusschock Jan 26, 2021
c5994e5
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Jan 30, 2021
9e46624
fixes
awaelchli Jan 30, 2021
22d2ae8
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Jan 30, 2021
901d392
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Jan 31, 2021
e174b8d
fix lightning optimizer merge
awaelchli Jan 31, 2021
98660de
reset bugreportmodel
awaelchli Jan 31, 2021
4d95b6c
unwrapping
awaelchli Jan 31, 2021
b69d013
step routing forward
awaelchli Jan 31, 2021
cb6676d
model access
awaelchli Jan 31, 2021
a33d27f
unwrap
awaelchli Jan 31, 2021
f7486e2
opt
awaelchli Jan 31, 2021
117f16d
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Jan 31, 2021
3792b72
integrate distrib_type
awaelchli Jan 31, 2021
ef85b81
sync changes
awaelchli Jan 31, 2021
9d9a940
sync
awaelchli Feb 1, 2021
f017a39
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Feb 1, 2021
a190a56
fixes
awaelchli Feb 1, 2021
73bb607
add forgotten generators
awaelchli Feb 1, 2021
c8c74f3
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Feb 1, 2021
ae71997
add missing logic
awaelchli Feb 1, 2021
d89847b
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Feb 1, 2021
0e686c3
update
awaelchli Feb 1, 2021
d6a43ea
import
awaelchli Feb 1, 2021
ceb8f75
missed imports
awaelchli Feb 1, 2021
fbb7c20
import fixes
awaelchli Feb 1, 2021
b610999
isort
awaelchli Feb 1, 2021
9b79924
mv f
awaelchli Feb 1, 2021
9afe54d
changelog
awaelchli Feb 1, 2021
3b63e82
Merge branch 'release/1.2-dev' into ref/update-plugins
awaelchli Feb 1, 2021
ca8cb68
format
awaelchli Feb 1, 2021
0633745
move helper to parallel plugin
awaelchli Feb 1, 2021
a622e0b
d
awaelchli Feb 1, 2021
18c682f
Merge branch 'ref/update-plugins' into accelerator-refactor-sharted-4
awaelchli Feb 1, 2021
f275803
add world size
awaelchli Feb 1, 2021
4ae008b
clean up
awaelchli Feb 1, 2021
3b3918b
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Feb 1, 2021
d4c6308
duplicate
awaelchli Feb 1, 2021
7eef4a0
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli Feb 2, 2021
9949164
activate ddp_sharded and tpu
awaelchli Feb 2, 2021
6d47357
set nvidia flags
awaelchli Feb 2, 2021
a6864ec
remove unused colab var
awaelchli Feb 2, 2021
b4b9724
use_tpu <-> on_tpu attrs
awaelchli Feb 2, 2021
81001e3
make some ddp_cpu and clusterplugin tests pass
awaelchli Feb 2, 2021
cea000d
Ref/accelerator connector (#5742)
justusschock Feb 2, 2021
933e2a1
plugins
awaelchli Feb 2, 2021
ad451d8
manual optimization
justusschock Feb 2, 2021
a30a3cf
update optimizer routing
justusschock Feb 2, 2021
a05b291
add rank to torchelastic
justusschock Feb 2, 2021
4388e73
fix memory mixed precision
awaelchli Feb 2, 2021
be9d029
setstate on trainer for pickling in ddp spawn
awaelchli Feb 2, 2021
a90a160
add predict method
awaelchli Feb 2, 2021
767bee0
add back commented accelerator code
awaelchli Feb 2, 2021
f771a7f
adapt test for sync_batch_norm to new plugin
awaelchli Feb 3, 2021
1a3b04e
fix deprecated tests
awaelchli Feb 3, 2021
a1f4938
fix ddp cpu choice when no num_processes are given
awaelchli Feb 3, 2021
38bc8b7
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
awaelchli Feb 3, 2021
ce6b6de
yapf format
awaelchli Feb 3, 2021
3b7c20b
skip a memory test that cannot pass anymore
awaelchli Feb 3, 2021
1d26c9b
update on comments
tchaton Feb 3, 2021
f538c75
fix pickle error in spawn plugin
awaelchli Feb 3, 2021
b44d82e
x
awaelchli Feb 3, 2021
3820e77
avoid
awaelchli Feb 3, 2021
08ae327
x
awaelchli Feb 3, 2021
7d0e094
avoid tons of warnings from importing deprecated modules
awaelchli Feb 3, 2021
1028011
fix cyclic import in docs build
awaelchli Feb 3, 2021
11bd0d6
add support for sharded
justusschock Feb 4, 2021
6bf0b60
update typing
justusschock Feb 4, 2021
f94082b
add sharded and sharded_spawn to distributed types
justusschock Feb 4, 2021
7939b99
make unwrap model default
justusschock Feb 4, 2021
9131ffb
refactor LightningShardedDataParallel similar to LightningDistributed…
justusschock Feb 4, 2021
ed7425c
update sharded spawn to reflect changes
justusschock Feb 4, 2021
209a164
update sharded to reflect changes
justusschock Feb 4, 2021
837a070
Merge 1.1.5 changes
awaelchli Feb 4, 2021
136b321
fix merge
awaelchli Feb 4, 2021
ffcb535
fix merge
awaelchli Feb 4, 2021
1edfa73
yapf isort
awaelchli Feb 4, 2021
a689b81
merge 1.1.6
awaelchli Feb 4, 2021
330b14c
fix merge
awaelchli Feb 4, 2021
ef258d5
yapf isort
awaelchli Feb 4, 2021
c85000d
fix indentation in test
awaelchli Feb 4, 2021
5f3a35e
copy over reinit scheduler implementation from dev1.2
awaelchli Feb 4, 2021
fa1c9b7
fix apex tracking calls with dev_debugger
awaelchli Feb 5, 2021
e330a11
reduce diff to dev1.2, clean up
awaelchli Feb 5, 2021
994ac82
fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu
awaelchli Feb 5, 2021
1a78601
sort plugin tests legacy/new
awaelchli Feb 6, 2021
4b76448
fix error handling for amp on cpu
awaelchli Feb 6, 2021
bfd54ab
Merge branch 'release/1.2-dev' into patch117
awaelchli Feb 6, 2021
0574d22
fix merge
awaelchli Feb 6, 2021
6ef6637
Merge branch 'patch117' into accelerator-refactor-sharded
awaelchli Feb 6, 2021
9feda39
[Feat] Resolve manual_backward (#5837)
tchaton Feb 6, 2021
7bb9d9f
fix tests/accelerator tests on cpu
awaelchli Feb 6, 2021
13ae1ff
[BugFix] Resolve manual optimization (#5852)
tchaton Feb 6, 2021
fc3b4db
Merge formatting changes from 1.2 branch
awaelchli Feb 6, 2021
b437642
Remove copy trainer parameters to happen earlier within the loop and …
SeanNaren Feb 7, 2021
8c6aa83
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
Feb 7, 2021
beb980a
resovle a bug
Feb 7, 2021
7a0fd27
Accelerator refactor sharded rpc (#5854)
justusschock Feb 7, 2021
0d0ced5
resolve bug
Feb 7, 2021
1f3ab76
fix assert in rpc test
awaelchli Feb 7, 2021
f1b1121
resolve a test
Feb 7, 2021
cd31fa1
fix docs compilation
awaelchli Feb 8, 2021
f48793e
accelerator refactor - fix for sharded parity test (#5866)
awaelchli Feb 8, 2021
81ff6ea
Remove DDP2 as this does not apply
Feb 8, 2021
20deb46
Add missing pre optimizer hook to ensure lambda closure is called
Feb 8, 2021
be4d1a2
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
Feb 8, 2021
0ac5fc4
fix apex docstring
awaelchli Feb 8, 2021
07fdd95
[accelerator][BugFix] Resolve some test for 1 gpu (#5863)
tchaton Feb 8, 2021
384b791
yapf isort
awaelchli Feb 8, 2021
b1a84b8
resolve flake8
tchaton Feb 8, 2021
a157a29
fix apex doctests
awaelchli Feb 8, 2021
08cfc65
fix apex doctests 2
awaelchli Feb 8, 2021
7888bfd
resolve docs
tchaton Feb 8, 2021
b5b4243
update drone
tchaton Feb 8, 2021
93ceb4c
Merge branch 'accelerator-refactor-sharded' of https://github.com/PyT…
tchaton Feb 8, 2021
d001bcf
clean env
Feb 8, 2021
ad47f47
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
tchaton Feb 8, 2021
60bfb1a
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
tchaton Feb 8, 2021
0608a41
update
Feb 8, 2021
f0120b5
update
Feb 8, 2021
bf8874e
Merge branch 'accelerator-refactor-sharded' of https://github.com/PyT…
Feb 8, 2021
baf7d7f
update
tchaton Feb 8, 2021
9360aad
update
tchaton Feb 8, 2021
b814cdc
merge
justusschock Feb 9, 2021
0d3ea37
Merge branch 'accelerator-refactor-sharded' of github.com:PytorchLigh…
justusschock Feb 9, 2021
f1f90c2
Fix RPC related tests, clean out old API, update for new accelerator …
SeanNaren Feb 9, 2021
6d05881
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
justusschock Feb 10, 2021
d86fdff
Update test_remove_1-4.py
justusschock Feb 10, 2021
5fbc1cf
Expose properties for tpu cores/gpus/num_gpus
Feb 10, 2021
aa9aea0
Add root GPU property
Feb 10, 2021
c35baf1
Move properties to properties.py
Feb 10, 2021
a9c6e21
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
awaelchli Feb 10, 2021
8f3947b
move tests that were previously in drone
awaelchli Feb 10, 2021
50ecc4a
Fix root GPU property (#5908)
SeanNaren Feb 10, 2021
c7d0075
fix best model path transfer when no checkpoint callback available
awaelchli Feb 10, 2021
3f61d15
Merge remote-tracking branch 'original/accelerator-refactor-sharded' …
awaelchli Feb 10, 2021
061ea46
Fix setup hook order [wip] (#5858)
SeanNaren Feb 10, 2021
1fe1f91
rename ddp sequential -> rpc sequential for special test
awaelchli Feb 10, 2021
3683f5a
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
awaelchli Feb 10, 2021
1f01b81
revert
awaelchli Feb 10, 2021
135c236
fix stupid merge problem
awaelchli Feb 10, 2021
222653d
Use property in connector for sampler (#5913)
SeanNaren Feb 10, 2021
f4311cd
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
awaelchli Feb 11, 2021
b210dee
merge the import conflicts
awaelchli Feb 11, 2021
236009e
fix spawning of processes in slurm
awaelchli Feb 11, 2021
aace276
[wip] Fix some bugs for TPU [skip ci] (#5878)
tchaton Feb 11, 2021
68273f5
resolve some tests
Feb 11, 2021
ca77fa4
update
Feb 11, 2021
c35edfd
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
justusschock Feb 11, 2021
8cacef7
fix imports
justusschock Feb 11, 2021
f7bbe48
update
Feb 11, 2021
30d9800
Merge branch 'accelerator-refactor-sharded' of https://github.com/PyT…
Feb 11, 2021
25f7f13
resolve flake8
tchaton Feb 11, 2021
fa28c41
update azure pipeline
tchaton Feb 11, 2021
51c27e6
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
tchaton Feb 11, 2021
b888d68
skip a sharded test on cpu that requires a gpu
awaelchli Feb 11, 2021
01ca4cd
resolve tpus
Feb 11, 2021
181d143
Merge branch 'master' into accelerator-refactor-sharded
justusschock Feb 11, 2021
946a1e9
resolve bug
Feb 11, 2021
2ad1a6e
Merge branch 'accelerator-refactor-sharded' of https://github.com/PyT…
Feb 11, 2021
6e0aff0
resolve flake8
tchaton Feb 11, 2021
a931791
update
Feb 11, 2021
319d034
Merge branch 'accelerator-refactor-sharded' of https://github.com/PyT…
Feb 11, 2021
4117bec
updat utils
Feb 11, 2021
8d000f7
Merge branch 'master' into accelerator-refactor-sharded
tchaton Feb 11, 2021
0b1ba67
revert permission change on files
awaelchli Feb 11, 2021
cc385b4
suggestions from carlos
awaelchli Feb 11, 2021
e9eb318
remove unrelated formatting changes
awaelchli Feb 11, 2021
7c08400
remove incomplete comment
awaelchli Feb 11, 2021
7c3d184
Update pytorch_lightning/accelerators/__init__.py
awaelchli Feb 11, 2021
503426e
remove unrelated formatting change
awaelchli Feb 11, 2021
c0fbf7a
add types
awaelchli Feb 11, 2021
23a9a10
warn 1.7 ddp manual backward only if ddp kwarg unset
awaelchli Feb 11, 2021
a70ee4a
yapf + isort
awaelchli Feb 11, 2021
b0621c4
pep8 unused imports
awaelchli Feb 11, 2021
18bfe70
Merge branch 'master' into accelerator-refactor-sharded
awaelchli Feb 11, 2021
7b0515d
fix cyclic import in docs
awaelchli Feb 12, 2021
d966057
Apply suggestions from code review
Borda Feb 12, 2021
f636d9d
typer in accelerator.py
Borda Feb 12, 2021
5579ea7
typo
tchaton Feb 12, 2021
f5df88b
Apply suggestions from code review
Borda Feb 12, 2021
233694e
formatting
Borda Feb 12, 2021
a47644a
update on comments
tchaton Feb 12, 2021
80dacb6
update typo
tchaton Feb 12, 2021
99573eb
Update pytorch_lightning/trainer/properties.py
tchaton Feb 12, 2021
ab859d7
update
tchaton Feb 12, 2021
0a633cb
Merge branch 'accelerator-refactor-sharded' into feat/5769_manual_opt…
tchaton Feb 12, 2021
4fb36da
update on comments
tchaton Feb 12, 2021
a578ac9
Merge branch 'master' into feat/5769_manual_optimization
awaelchli Feb 13, 2021
00055ac
Merge branch 'master' into feat/5769_manual_optimization
tchaton Feb 13, 2021
a9cdc4e
resolve some comments
tchaton Feb 13, 2021
c219416
Merge branch 'feat/5769_manual_optimization' of https://github.com/Py…
tchaton Feb 13, 2021
5760e12
update on comments
tchaton Feb 13, 2021
09d1f24
resolve test
tchaton Feb 13, 2021
ca71e62
add toggle_model
tchaton Feb 13, 2021
9519a31
update
tchaton Feb 13, 2021
68f5082
update on comments
tchaton Feb 13, 2021
d831931
update doc
tchaton Feb 13, 2021
559972f
typo
tchaton Feb 13, 2021
b5a1e55
update
tchaton Feb 13, 2021
00b9b99
typo
tchaton Feb 13, 2021
c2e79f8
remove space
tchaton Feb 13, 2021
79e6e8e
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 13, 2021
9893e4c
update
tchaton Feb 13, 2021
14e5499
Merge branch 'feat/5769_manual_optimization' of https://github.com/Py…
tchaton Feb 13, 2021
d7d7ec9
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 13, 2021
26a592f
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 13, 2021
652164c
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 13, 2021
d0f5875
update on comments
tchaton Feb 13, 2021
f880878
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 13, 2021
2e2aed9
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 13, 2021
e9ca4ab
update on comments
tchaton Feb 13, 2021
f5dfab0
Merge branch 'feat/5769_manual_optimization' of https://github.com/Py…
tchaton Feb 13, 2021
2454723
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 14, 2021
32795e5
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 14, 2021
6a44f22
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 15, 2021
e78efc4
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 15, 2021
bcd0388
update
tchaton Feb 15, 2021
8084243
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 15, 2021
315201a
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 15, 2021
86b8d98
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 15, 2021
9e3c333
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 15, 2021
684098f
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 15, 2021
5d27b18
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 15, 2021
5dd1c9b
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 15, 2021
84ec28a
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 15, 2021
faa96e9
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 16, 2021
a4a0985
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 16, 2021
e4074aa
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 16, 2021
e70fefe
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 16, 2021
869a46d
Merge branch 'master' into feat/5769_manual_optimization
mergify[bot] Feb 16, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Refactored `EpochResultStore` ([#5522](https://github.com/PyTorchLightning/pytorch-lightning/pull/5522))


- LightningOptimizer manual optimizer is more flexible and expose `toggle_model` ([#5771](https://github.com/PyTorchLightning/pytorch-lightning/pull/5771))



### Deprecated

- Function `stat_scores_multiple_classes` is deprecated in favor of `stat_scores` ([#4839](https://github.com/PyTorchLightning/pytorch-lightning/pull/4839))
Expand Down
142 changes: 98 additions & 44 deletions docs/source/common/optimizers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,46 +21,117 @@ Manual optimization
For advanced research topics like reinforcement learning, sparse coding, or GAN research, it may be desirable
to manually manage the optimization process. To do so, do the following:

* Disable automatic optimization in Trainer: Trainer(automatic_optimization=False)
* Override your LightningModule ``automatic_optimization`` property to return ``False``
* Drop or ignore the optimizer_idx argument
* Use `self.manual_backward(loss)` instead of `loss.backward()` to automatically scale your loss
* Use `self.manual_backward(loss)` instead of `loss.backward()`.

.. note:: This is only recommended for experts who need ultimate flexibility. Lightning will handle only precision and accelerators logic. The users are left with zero_grad, accumulated_grad_batches, model toggling, etc..

.. warning:: Before 1.2, ``optimzer.step`` was calling ``zero_grad`` internally. From 1.2, it is left to the users expertize.

.. tip:: To perform ``accumulate_grad_batches`` with one optimizer, you can do as such.

.. tip:: ``self.optimizers()`` will return ``LightningOptimizer`` objects. You can access your own optimizer with ``optimizer.optimizer``. However, if you use your own optimizer to perform a step, Lightning won't be able to support accelerators and precision for you.


.. code-block:: python

def training_step(self, batch, batch_idx, optimizer_idx):
def training_step(batch, batch_idx, optimizer_idx):
opt = self.optimizers()

loss = self.compute_loss(batch)
self.manual_backward(loss)
opt.step()

# 1. ignore optimizer_idx
# 2. `use_pl_optimizer=True` means `opt_g` and `opt_d` will be of type `LightningOptimizer`
# `LightningOptimizer` simply wrapped your optimizer and behave the same way !
# When calling `optimizer.step`, `LightningOptimizer` will just handle TPU, AMP, accumulate_grad_batches, etc ... for you.
# accumulate gradient batches
if batch_idx % 2 == 0:
opt.zero_grad()

# access your optimizers with `use_pl_optimizer=False` or `optimizer.optimizer` when using use_pl_optimizer=True
# use_pl_optimizer=True is the default
(opt_g, opt_d) = self.optimizers(use_pl_optimizer=True)

# do anything you want
loss_a = ...
.. tip:: It is a good practice to provide the optimizer with a ``closure`` function that performs a ``forward`` and ``backward`` pass of your model. It is optional for most optimizers, but makes your code compatible if you switch to an optimizer which requires a closure.

# use self.backward which will also handle scaling the loss when using amp
self.manual_backward(loss_a, opt_g)
opt_g.step()
Here is the same example as above using a ``closure``.

.. code-block:: python

def training_step(batch, batch_idx, optimizer_idx):
opt = self.optimizers()

def forward_and_backward():
loss = self.compute_loss(batch)
self.manual_backward(loss)

# do anything you want
loss_b = ...
opt.step(closure=forward_and_backward)

# pass in any args that loss.backward() normally takes
self.manual_backward(loss_b, opt_d, retain_graph=True)
self.manual_backward(loss_b, opt_d)
opt_d.step()
# accumulate gradient batches
if batch_idx % 2 == 0:
opt.zero_grad()


.. code-block:: python
tchaton marked this conversation as resolved.
Show resolved Hide resolved

# log losses
self.log('loss_a', loss_a)
self.log('loss_b', loss_b)
# Scenario for a GAN.

.. note:: This is only recommended for experts who need ultimate flexibility
def training_step(...):
opt_gen, opt_dis = self.optimizers()

Manual optimization does not yet support accumulated gradients but will be live in 1.1.0
# compute generator loss
loss_gen = self.compute_generator_loss(...)

# zero_grad needs to be called before backward
opt_gen.zero_grad()
self.manual_backward(loss_gen)
opt_gen.step()

# compute discriminator loss
loss_dis = self.compute_discriminator_loss(...)

# zero_grad needs to be called before backward
opt_dis.zero_grad()
self.manual_backward(loss_dis)
opt_dis.step()


.. note:: ``LightningOptimizer`` provides a ``toggle_model`` function as a ``@context_manager`` for advanced users. It can be useful when performing gradient accumulation with several optimizers or training in a distributed setting.

Here is an explanation of what it does:

Considering the current optimizer as A and all other optimizers as B.
Toggling means that all parameters from B exclusive to A will have their ``requires_grad`` attribute set to ``False``. Their original state will be restored when exiting the context manager.

When performing gradient accumulation, there is no need to perform grad synchronization during the accumulation phase.
Setting ``sync_grad`` to ``False`` will block this synchronization and improve your training speed.

Here is an example on how to use it:

.. code-block:: python


# Scenario for a GAN with gradient accumulation every 2 batches and optimized for multiple gpus.

def training_step(self, batch, batch_idx, ...):
opt_gen, opt_dis = self.optimizers()

accumulated_grad_batches = batch_idx % 2 == 0

# compute generator loss
def closure_gen():
loss_gen = self.compute_generator_loss(...)
self.manual_backward(loss_gen)
if accumulated_grad_batches:
opt_gen.zero_grad()

with opt_gen.toggle_model(sync_grad=accumulated_grad_batches):
opt_gen.step(closure=closure_gen)

def closure_dis():
loss_dis = self.compute_discriminator_loss(...)
self.manual_backward(loss_dis)
if accumulated_grad_batches:
opt_dis.zero_grad()

with opt_dis.toggle_model(sync_grad=accumulated_grad_batches):
opt_dis.step(closure=closure_dis)

------

Expand Down Expand Up @@ -166,7 +237,7 @@ returned as a dict which can contain the following keywords:
* ``strict`` (optional): if set to ``True`` will enforce that value specified in ``monitor`` is available while trying
to call ``scheduler.step()``, and stop training if not found. If ``False`` will only give a warning and continue training
(without calling the scheduler).
* ``name`` (optional): if using the :class:`~pytorch_lightning.callbacks.LearningRateMonitor` callback to monitor the
* ``name`` (optional): if using the :class:`~pytorch_lightning.callbacks.LearningRateMonitor` callback to monitor the
learning rate progress, this keyword can be used to specify a specific name the learning rate should be logged as.

.. testcode::
Expand Down Expand Up @@ -248,23 +319,6 @@ For example, here step optimizer A every 2 batches and optimizer B every 4 batch
if batch_nb % 4 == 0 :
optimizer.step(closure=closure)

.. note:: When using ``Trainer(enable_pl_optimizer=True)``, ``.step`` accepts a boolean ``make_optimizer_step`` which can be used as follow.

.. testcode::

def optimizer_zero_grad(self, current_epoch, batch_idx, optimizer, opt_idx):
optimizer.zero_grad()

# Alternating schedule for optimizer steps (ie: GANs)
def optimizer_step(self, current_epoch, batch_nb, optimizer, optimizer_idx, closure, on_tpu=False, using_native_amp=False, using_lbfgs=False):
# update generator opt every 2 steps
if optimizer_idx == 0:
optimizer.step(closure=closure, make_optimizer_step=(batch_nb % 2) == 0)

# update discriminator opt every 4 steps
if optimizer_idx == 1:
optimizer.step(closure=closure, make_optimizer_step=(batch_nb % 4) == 0)

Here we add a learning-rate warm up

.. testcode::
Expand Down
10 changes: 4 additions & 6 deletions pytorch_lightning/accelerators/accelerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ def backward(
self,
closure_loss: torch.Tensor,
optimizer: Optimizer,
opt_idx: int,
optimizer_idx: int,
should_accumulate: bool,
*args,
**kwargs,
Expand All @@ -247,17 +247,15 @@ def backward(

Args:
closure_loss: a tensor holding the loss value to backpropagate
optimizer: the optimizer to do the step later on.
opt_idx: the index of the optimizer
should_accumulate: whether to accumulate gradients
"""
self.training_type_plugin.pre_backward(closure_loss, should_accumulate, optimizer, opt_idx)
self.training_type_plugin.pre_backward(closure_loss, should_accumulate, optimizer, optimizer_idx)

output = self.precision_plugin.backward(
self.lightning_module, closure_loss, optimizer, opt_idx, should_accumulate, *args, **kwargs
self.lightning_module, closure_loss, optimizer, optimizer_idx, should_accumulate, *args, **kwargs
)

self.training_type_plugin.post_backward(closure_loss, should_accumulate, optimizer, opt_idx)
self.training_type_plugin.post_backward(closure_loss, should_accumulate, optimizer, optimizer_idx)

return output

Expand Down
10 changes: 8 additions & 2 deletions pytorch_lightning/core/lightning.py
Original file line number Diff line number Diff line change
Expand Up @@ -1186,7 +1186,7 @@ def configure_optimizers(self):
"""
rank_zero_warn("`configure_optimizers` must be implemented to be used with the Lightning Trainer")

def manual_backward(self, loss: Tensor, optimizer: Optimizer, *args, **kwargs) -> None:
def manual_backward(self, loss: Tensor, optimizer: Optional[Optimizer] = None, *args, **kwargs) -> None:
"""
Call this directly from your training_step when doing optimizations manually.
By using this we can ensure that all the proper scaling when using 16-bit etc has been done for you
Expand All @@ -1207,12 +1207,18 @@ def training_step(...):
self.manual_backward(loss, opt_a)
opt_a.step()
"""
if optimizer is not None:
rank_zero_warn(
"`optimizer` argument to `manual_backward` is deprecated in v1.2 and will be removed in v1.4",
DeprecationWarning
)

# make sure we're using manual opt
self._verify_is_manual_optimization('manual_backward')

# backward
self._running_manual_backward = True
self.trainer.train_loop.backward(loss, optimizer, -1, *args, **kwargs)
self.trainer.train_loop.backward(loss, optimizer=None, opt_idx=None, *args, **kwargs)
self._running_manual_backward = False

def backward(self, loss: Tensor, optimizer: Optimizer, optimizer_idx: int, *args, **kwargs) -> None:
Expand Down
Loading