auto_scale_batch_size won't reset current_epoch #3260

maxjeblick · 2020-08-29T22:09:45Z

🐛 Bug

When auto_scale_batch_size is enabled, the model is initially trained with varying batch sizes. When training begins, trainer.current_epoch equals 1 instead of 0.

To Reproduce

Either observe the progress bar or use a simple callback to track the epoch number, once with auto_scale_batch_size enabled and once with auto_scale_batch_size disabled.

from pytorch_lightning import Callback

class PrintCallback(Callback):
    
    def __init__(self):
        self.observed_epochs = []
        
    def on_train_epoch_start(self, trainer, pl_module):
        print(f'Current Epoch: {trainer.current_epoch}')
        self.observed_epochs.append(trainer.current_epoch)

The text was updated successfully, but these errors were encountered:

rohitgr7 · 2020-08-30T12:50:23Z

since it calls it with various batch_sizes did you find where it sets current_epoch to 0 before checking it on next batch_size?

maxjeblick · 2020-08-31T11:43:44Z

The problem is during model checkpointing. The checkpoint sets 'epoch': self.current_epoch + 1,. That checkpoint will be loaded after having completed the batch size finder. During batch size scaling, the epoch won't be increased.

edenlightning · 2020-09-02T18:48:34Z

Currently blocked until trainer.tune is added.

maxjeblick added bug Something isn't working help wanted Open to be worked on labels Aug 29, 2020

maxjeblick mentioned this issue Aug 29, 2020

add current_epoch to dumped_params #3261

Merged

edenlightning added this to the 0.9.x milestone Sep 1, 2020

Borda added the checkpointing Related to checkpointing label Sep 4, 2020

edenlightning added the priority: 0 High priority task label Sep 16, 2020

edenlightning modified the milestones: 0.9.x, 1.0 Oct 4, 2020

edenlightning removed the help wanted Open to be worked on label Oct 5, 2020

Borda closed this as completed in #3261 Oct 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto_scale_batch_size won't reset current_epoch #3260

auto_scale_batch_size won't reset current_epoch #3260

maxjeblick commented Aug 29, 2020

rohitgr7 commented Aug 30, 2020

maxjeblick commented Aug 31, 2020 •

edited

Loading

edenlightning commented Sep 2, 2020

auto_scale_batch_size won't reset current_epoch #3260

auto_scale_batch_size won't reset current_epoch #3260

Comments

maxjeblick commented Aug 29, 2020

🐛 Bug

To Reproduce

rohitgr7 commented Aug 30, 2020

maxjeblick commented Aug 31, 2020 • edited Loading

edenlightning commented Sep 2, 2020

maxjeblick commented Aug 31, 2020 •

edited

Loading