Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto_scale_batch_size won't reset current_epoch #3260

Closed
maxjeblick opened this issue Aug 29, 2020 · 3 comments · Fixed by #3261
Closed

auto_scale_batch_size won't reset current_epoch #3260

maxjeblick opened this issue Aug 29, 2020 · 3 comments · Fixed by #3261
Labels
bug Something isn't working checkpointing Related to checkpointing priority: 0 High priority task
Milestone

Comments

@maxjeblick
Copy link
Contributor

🐛 Bug

When auto_scale_batch_size is enabled, the model is initially trained with varying batch sizes. When training begins, trainer.current_epoch equals 1 instead of 0.

To Reproduce

Either observe the progress bar or use a simple callback to track the epoch number, once with auto_scale_batch_size enabled and once with auto_scale_batch_size disabled.

from pytorch_lightning import Callback

class PrintCallback(Callback):
    
    def __init__(self):
        self.observed_epochs = []
        
    def on_train_epoch_start(self, trainer, pl_module):
        print(f'Current Epoch: {trainer.current_epoch}')
        self.observed_epochs.append(trainer.current_epoch)

@maxjeblick maxjeblick added bug Something isn't working help wanted Open to be worked on labels Aug 29, 2020
@rohitgr7
Copy link
Contributor

since it calls it with various batch_sizes did you find where it sets current_epoch to 0 before checking it on next batch_size?

@maxjeblick
Copy link
Contributor Author

maxjeblick commented Aug 31, 2020

The problem is during model checkpointing. The checkpoint sets 'epoch': self.current_epoch + 1,. That checkpoint will be loaded after having completed the batch size finder. During batch size scaling, the epoch won't be increased.

@edenlightning edenlightning added this to the 0.9.x milestone Sep 1, 2020
@edenlightning
Copy link
Contributor

Currently blocked until trainer.tune is added.

@Borda Borda added the checkpointing Related to checkpointing label Sep 4, 2020
@edenlightning edenlightning added the priority: 0 High priority task label Sep 16, 2020
@edenlightning edenlightning modified the milestones: 0.9.x, 1.0 Oct 4, 2020
@edenlightning edenlightning removed the help wanted Open to be worked on label Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working checkpointing Related to checkpointing priority: 0 High priority task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants