Skip to content

Commit

Permalink
Steps (#1051)
Browse files Browse the repository at this point in the history
* training_end renamed to training_step_end

* training_end renamed to training_step_end

* training_end renamed to training_step_end

* training_end renamed to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* fix lost model reference

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end

* training_end to training_step_end
  • Loading branch information
williamFalcon committed Mar 5, 2020
1 parent 969e929 commit 29faea1
Show file tree
Hide file tree
Showing 12 changed files with 1,391 additions and 636 deletions.
Binary file added docs/source/_images/lightning_module/pt_to_pl.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 5 additions & 17 deletions docs/source/experiment_reporting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,47 +34,35 @@ Log metrics

To plot metrics into whatever logger you passed in (tensorboard, comet, neptune, etc...)

1. Training_end, validation_end, test_end will all log anything in the "log" key of the return dict.
1. training_epoch_end, validation_epoch_end, test_epoch_end will all log anything in the "log" key of the return dict.

.. code-block:: python
def training_end(self, outputs):
def training_epoch_end(self, outputs):
loss = some_loss()
...
logs = {'train_loss': loss}
results = {'log': logs}
return results
def validation_end(self, outputs):
def validation_epoch_end(self, outputs):
loss = some_loss()
...
logs = {'val_loss': loss}
results = {'log': logs}
return results
def test_end(self, outputs):
def test_epoch_end(self, outputs):
loss = some_loss()
...
logs = {'test_loss': loss}
results = {'log': logs}
return results
2. Most of the time, you only need training_step and not training_end. You can also return logs from here:

.. code-block:: python
def training_step(self, batch, batch_idx):
loss = some_loss()
...
logs = {'train_loss': loss}
results = {'log': logs}
return results
3. In addition, you can also use any arbitrary functionality from a particular logger from within your LightningModule.
2. In addition, you can also use any arbitrary functionality from a particular logger from within your LightningModule.
For instance, here we log images using tensorboard.

.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion docs/source/hooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Training loop
- on_batch_start
- tbptt_split_batch
- training_step
- training_end (optional)
- training_step_end (optional)
- backward
- on_after_backward
- optimizer.step()
Expand Down
45 changes: 40 additions & 5 deletions docs/source/multi_gpu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -165,12 +165,13 @@ you will only be operating on one of those pieces.
y_0 = batch
For most metrics, this doesn't really matter. However, if you want
full batch statistics or want to use the outputs of the training_step
to do something like a softmax, you can use the `training_end` step.
to add something to your computational graph (like softmax)
using all batch parts you can use the `training_step_end` step.

.. code-block:: python
def training_end(self, outputs):
def training_step_end(self, outputs):
# only use when on dp
outputs = torch.cat(outputs, dim=1)
softmax = softmax(outputs, dim=1)
out = softmax.mean()
Expand All @@ -195,9 +196,43 @@ In pseudocode, the full sequence is:
out = gpu_model(batch_split)
all_results.append(out)
# calculate statistics for all parts of the batch
full out = model.training_end(all_results)
# use the full batch for something like softmax
full out = model.training_step_end(all_results)
to illustrate why this is needed, let's look at dataparallel

.. code-block:: python
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.forward(batch)
# on dp or ddp2 if we did softmax now it would be wrong
# because batch is actually a piece of the full batch
return y_hat
def training_step_end(self, batch_parts_outputs):
# batch_parts_outputs has outputs of each part of the batch
# do softmax here
outputs = torch.cat(outputs, dim=1)
softmax = softmax(outputs, dim=1)
out = softmax.mean()
return out
If `training_step_end` is defined it will be called regardless of tpu, dp, ddp, etc... which means
it will behave the same no matter the backend.

Validation and test step also have the same option when using dp

.. code-block:: python
def validation_step_end(self, batch_parts_outputs):
...
def test_step_end(self, batch_parts_outputs):
...
Implement Your Own Distributed (DDP) training
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
Loading

0 comments on commit 29faea1

Please sign in to comment.