Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename Model steps #1051

Merged
merged 37 commits into from
Mar 5, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
4972608
training_end renamed to training_step_end
williamFalcon Mar 5, 2020
738692c
training_end renamed to training_step_end
williamFalcon Mar 5, 2020
84560fd
training_end renamed to training_step_end
williamFalcon Mar 5, 2020
a3dbe56
training_end renamed to training_step_end
williamFalcon Mar 5, 2020
165ca51
training_end to training_step_end
williamFalcon Mar 5, 2020
890364e
training_end to training_step_end
williamFalcon Mar 5, 2020
f76fdb8
training_end to training_step_end
williamFalcon Mar 5, 2020
39b66c3
training_end to training_step_end
williamFalcon Mar 5, 2020
e7e1ce9
fix lost model reference
williamFalcon Mar 5, 2020
9db4d1f
training_end to training_step_end
williamFalcon Mar 5, 2020
e13beac
training_end to training_step_end
williamFalcon Mar 5, 2020
684fc47
training_end to training_step_end
williamFalcon Mar 5, 2020
c508dfb
training_end to training_step_end
williamFalcon Mar 5, 2020
8b8679c
training_end to training_step_end
williamFalcon Mar 5, 2020
2afc704
training_end to training_step_end
williamFalcon Mar 5, 2020
697cb5d
training_end to training_step_end
williamFalcon Mar 5, 2020
4938b81
training_end to training_step_end
williamFalcon Mar 5, 2020
78e2435
training_end to training_step_end
williamFalcon Mar 5, 2020
8d980a9
training_end to training_step_end
williamFalcon Mar 5, 2020
bfa3fdd
training_end to training_step_end
williamFalcon Mar 5, 2020
77baa64
training_end to training_step_end
williamFalcon Mar 5, 2020
3f7d5e0
training_end to training_step_end
williamFalcon Mar 5, 2020
b13b348
training_end to training_step_end
williamFalcon Mar 5, 2020
e3ac274
training_end to training_step_end
williamFalcon Mar 5, 2020
0a27d95
training_end to training_step_end
williamFalcon Mar 5, 2020
a106c47
training_end to training_step_end
williamFalcon Mar 5, 2020
bc4db9f
training_end to training_step_end
williamFalcon Mar 5, 2020
8199a73
training_end to training_step_end
williamFalcon Mar 5, 2020
aa97340
training_end to training_step_end
williamFalcon Mar 5, 2020
8568674
training_end to training_step_end
williamFalcon Mar 5, 2020
b964b70
training_end to training_step_end
williamFalcon Mar 5, 2020
5a9e405
training_end to training_step_end
williamFalcon Mar 5, 2020
2184e52
training_end to training_step_end
williamFalcon Mar 5, 2020
c56b09c
training_end to training_step_end
williamFalcon Mar 5, 2020
f687043
training_end to training_step_end
williamFalcon Mar 5, 2020
a401841
training_end to training_step_end
williamFalcon Mar 5, 2020
7268839
training_end to training_step_end
williamFalcon Mar 5, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 5 additions & 17 deletions docs/source/experiment_reporting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,47 +34,35 @@ Log metrics

To plot metrics into whatever logger you passed in (tensorboard, comet, neptune, etc...)

1. Training_end, validation_end, test_end will all log anything in the "log" key of the return dict.
1. training_epoch_end, validation_epoch_end, test_epoch_end will all log anything in the "log" key of the return dict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps these should say validation_epoch_end or validation_step_end?

.. code-block:: python

def training_end(self, outputs):
def training_epoch_end(self, outputs):
loss = some_loss()
...

logs = {'train_loss': loss}
results = {'log': logs}
return results

def validation_end(self, outputs):
def validation_epoch_end(self, outputs):
loss = some_loss()
...

logs = {'val_loss': loss}
results = {'log': logs}
return results

def test_end(self, outputs):
def test_epoch_end(self, outputs):
loss = some_loss()
...

logs = {'test_loss': loss}
results = {'log': logs}
return results

2. Most of the time, you only need training_step and not training_end. You can also return logs from here:

.. code-block:: python

def training_step(self, batch, batch_idx):
loss = some_loss()
...

logs = {'train_loss': loss}
results = {'log': logs}
return results

3. In addition, you can also use any arbitrary functionality from a particular logger from within your LightningModule.
2. In addition, you can also use any arbitrary functionality from a particular logger from within your LightningModule.
For instance, here we log images using tensorboard.

.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion docs/source/hooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Training loop
- on_batch_start
- tbptt_split_batch
- training_step
- training_end (optional)
- training_step_end (optional)
- backward
- on_after_backward
- optimizer.step()
Expand Down
45 changes: 40 additions & 5 deletions docs/source/multi_gpu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -165,12 +165,13 @@ you will only be operating on one of those pieces.
y_0 = batch

For most metrics, this doesn't really matter. However, if you want
full batch statistics or want to use the outputs of the training_step
to do something like a softmax, you can use the `training_end` step.
to add something to your computational graph (like softmax)
using all batch parts you can use the `training_step_end` step.

.. code-block:: python

def training_end(self, outputs):
def training_step_end(self, outputs):
# only use when on dp
outputs = torch.cat(outputs, dim=1)
softmax = softmax(outputs, dim=1)
out = softmax.mean()
Expand All @@ -195,9 +196,43 @@ In pseudocode, the full sequence is:
out = gpu_model(batch_split)
all_results.append(out)

# calculate statistics for all parts of the batch
full out = model.training_end(all_results)
# use the full batch for something like softmax
full out = model.training_step_end(all_results)

to illustrate why this is needed, let's look at dataparallel

.. code-block:: python

def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.forward(batch)

# on dp or ddp2 if we did softmax now it would be wrong
# because batch is actually a piece of the full batch
return y_hat

def training_step_end(self, batch_parts_outputs):
# batch_parts_outputs has outputs of each part of the batch

# do softmax here
outputs = torch.cat(outputs, dim=1)
softmax = softmax(outputs, dim=1)
out = softmax.mean()

return out

If `training_step_end` is defined it will be called regardless of tpu, dp, ddp, etc... which means
it will behave the same no matter the backend.

Validation and test step also have the same option when using dp

.. code-block:: python

def validation_step_end(self, batch_parts_outputs):
...

def test_step_end(self, batch_parts_outputs):
...

Implement Your Own Distributed (DDP) training
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
Loading