You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the Trainer terminates at max_epoch (1000) even if the max_time is specified. Even if min_epoch is set to be greater than 1000, the Trainer should either overwrite the max_epoch value, or the documentation should state that min_epoch is only applicable if early_stopping is enabled. It could also throw warning when min_epoch is set while early_stopping is disabled.
Motivation
To continue training for X amount of time (e.g 7 days) when you don't know the maximum number of epochs and no early_stopping/criteria is specified.
Pitch
I think overwriting max_epoch with values of min_epoch or max_time should be reasonable when they are larger than max_epoch. I wanted to train a big model for 2 days, but after 1 day if was stopped despite setting the max_time and min_epoch parameters.
If you are concerned with the duration of training, or you are interested in the asymptotic behavior of your model in infinite time horizon, it makes sense to stop training when it reaches a max_time regardless of max_epoch.
Alternatives
The alternative is to set a very large max_epoch to ensure the training won't stop until max_time is reached.
The text was updated successfully, but these errors were encountered:
kevinNejad
changed the title
Trainer overwrite max_epoch when max_time is specified
Trainer should overwrite max_epoch when max_time is specified
Jun 30, 2021
This also raises what the default should be if no stopping condition is passed to the trainer (e.g. no max steps / max epochs / max time) - should this be an infinite while loop and require an external signal to kill the job? Is it conceivable someone wants to train forever, or for an absurdly long amount of time where they neither know the number of epochs or time it takes for convergence? Should we allow this only if there's an early stopping callback attached to the trainer?
Even if min_epoch is set to be greater than 1000, the Trainer should either overwrite the max_epoch value, or the documentation should state that min_epoch is only applicable if early_stopping is enabled. It could also throw warning when min_epoch is set while early_stopping is disabled.
Would this be needed if we addressed the max time issue? Overwriting the arguments to me doesn't feel right. If a user is specifying min_epochs > max_epochs, we should raise a misconfiguration exception.
🚀 Feature
Currently, the Trainer terminates at max_epoch (1000) even if the max_time is specified. Even if min_epoch is set to be greater than 1000, the Trainer should either overwrite the max_epoch value, or the documentation should state that min_epoch is only applicable if early_stopping is enabled. It could also throw warning when min_epoch is set while early_stopping is disabled.
Motivation
To continue training for X amount of time (e.g 7 days) when you don't know the maximum number of epochs and no early_stopping/criteria is specified.
Pitch
I think overwriting max_epoch with values of min_epoch or max_time should be reasonable when they are larger than max_epoch. I wanted to train a big model for 2 days, but after 1 day if was stopped despite setting the max_time and min_epoch parameters.
If you are concerned with the duration of training, or you are interested in the asymptotic behavior of your model in infinite time horizon, it makes sense to stop training when it reaches a max_time regardless of max_epoch.
Alternatives
The alternative is to set a very large max_epoch to ensure the training won't stop until max_time is reached.
The text was updated successfully, but these errors were encountered: