-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate epochs when calling .fit() twice #5007
Comments
The epoch number is generated here: which assumes that When using So the solution would be to increase the epoch number at the end of |
@carmocca or should we reset these parameters in the trainer |
I'd say it's more natural to do it |
Any opinion on what would be the recommended way of resetting the Is something along those lines
safe ? |
Yes, it should be safe. |
@carmocca what is left TODO here? |
Everything, the bug is not fixed 🙂 There is a reproduction test at the top. We just need to make our minds about the best solution. Context here: #5007 (comment) |
Status update: WIP - tackling other related issues first. Need this for fault-tolerance |
Status update: Blocked by merging #8477 and enabling restoring the ckpt progress tracking state by default. |
I think the current_epoch can no longer be set in the trainer but must be set in the fit_loop itself. |
Correct. You'll need to do |
🐛 Bug
To Reproduce
Expected behavior
Assertion does not fail
Environment
Current master
cc @tchaton @Borda
The text was updated successfully, but these errors were encountered: