trainer.progress_bar_dict values are delayed by one epoch #4863

carmocca · 2020-11-26T02:43:22Z

🐛 Bug

trainer.progress_bar_dict does not contain what was self.log-ed in that epoch's training step, but the previous value.
This also means that there is no value for epoch 0.

To Reproduce (and to test)

def test_progress_bar_dict_contains_values_on_train_epoch_end(tmpdir):
    class TestModel(BoringModel):
        def training_step(self, *args):
            self.log("foo", torch.tensor(self.current_epoch), on_step=False, on_epoch=True, prog_bar=True)
            return super().training_step(*args)

        def on_train_epoch_end(self, *_):
            self.epoch_end_called = True
            assert self.trainer.progress_bar_dict["foo"] == self.current_epoch

    trainer = Trainer(
        default_root_dir=tmpdir,
        max_epochs=2,
        limit_train_batches=1,
        limit_val_batches=0,
        checkpoint_callback=False,
        logger=False,
        weights_summary=None,
        progress_bar_refresh_rate=0,
    )
    model = TestModel()
    trainer.fit(model)
    assert model.epoch_end_called

Expected behavior

trainer.progress_bar_dict contains the values logged in training_step

Environment

The test fails on master

The commit that introduced this bug is 9c8701f. The previous one works as expected.

I haven't dived into the changes much since it is a large commit. Maybe the author can help @tchaton

The text was updated successfully, but these errors were encountered:

carmocca · 2020-11-27T02:10:33Z

Okay, the cause of the issue is that trainer.call_hook('on_train_epoch_end', epoch_output) is run before trainer.logger_connector.on_train_epoch_end(). And the progress_bar_dict is updated in the second one.

However, changing the order here
https://github.com/PyTorchLightning/pytorch-lightning/blob/217650320e376f4dadd1c7b8c034ec55dee60a23/pytorch_lightning/trainer/training_loop.py#L816-L820
fixes the test above but fails https://github.com/PyTorchLightning/pytorch-lightning/blob/master/tests/trainer/logging/test_logger_connector.py#L52-L215 with a KeyError

I will keep debugging...

djovanoski · 2020-11-27T13:49:08Z

I think the problem is in the reset in call_hook of the model._results as we call after that also self._cache_logged_metrics() which is on empty Result.

Also in the def training_step(...) in training_loop.py we have call_hook which reset the Result and after that collecting the results from model._results which is empty

tchaton · 2020-11-28T13:06:41Z

Hey @carmocca,

Thanks for reporting this bug. Yes, I needs to be permuted. I will resolve this next week.

Best regards,
T.C

carmocca added bug Something isn't working help wanted Open to be worked on labels Nov 26, 2020

carmocca mentioned this issue Nov 27, 2020

Improve epoch_result_store code quality #4875

Merged

7 tasks

carmocca changed the title ~~Missing values in trainer.progress_bar_dict after epoch 0~~ trainer.progress_bar_dict values are delayed by one epoch Nov 28, 2020

tchaton self-assigned this Nov 28, 2020

tchaton added the priority: 0 High priority task label Nov 28, 2020

tchaton added this to the 1.1 milestone Nov 28, 2020

tchaton mentioned this issue Nov 30, 2020

[HotFix] Logging - One epoch delay on training epoch metrics. #4913

Merged

11 tasks

tchaton closed this as completed in #4913 Dec 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trainer.progress_bar_dict values are delayed by one epoch #4863

trainer.progress_bar_dict values are delayed by one epoch #4863

carmocca commented Nov 26, 2020 •

edited

Loading

carmocca commented Nov 27, 2020 •

edited

Loading

djovanoski commented Nov 27, 2020

tchaton commented Nov 28, 2020

trainer.progress_bar_dict values are delayed by one epoch #4863

trainer.progress_bar_dict values are delayed by one epoch #4863

Comments

carmocca commented Nov 26, 2020 • edited Loading

🐛 Bug

To Reproduce (and to test)

Expected behavior

Environment

carmocca commented Nov 27, 2020 • edited Loading

djovanoski commented Nov 27, 2020

tchaton commented Nov 28, 2020

carmocca commented Nov 26, 2020 •

edited

Loading

carmocca commented Nov 27, 2020 •

edited

Loading