-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper support for Pytorch SequentialLR Scheduler #10759
Comments
Related to this but for another issue: There is no support for custom LRScheduler afaik unless they are inherited from _LRScheduler AND require no extra arguments when .step() is called. In vanilla pytorch this is "solved" as the user is calling .step() directly. Perhaps the newly introduced customizable loops is a solution for this? |
I would rather go with "Manual Optimization" before going for custom loops. In manual optimization mode, the user calls the scheduler step manually, hence you can use a custom one and pass args to its step method. How does that sound for a (temporary?) workaround? |
great idea, i forgot about that. I guess that solves the custom scheduler issue. But I still think the support for SequentialLR is necessary. I think there is a debate on Pytorch whether the schedule class needs a rewrite, so perhaps one waits for that? |
For the main issue regarding
Do you have any scheduler that is inherited from |
Well, I wanted to use WarmStart (custom scheduler inherited from _LRScheduler) together with ReduceLROnPlateu. First try was using SequentialLR but as outlined above is impossible to use with ReduceLROnPlateu. Then I simply inherited from ReduceLROnPlateu and modified its step function to include the logic from WarmStart. See here for the code. That works now with PytorchLightning as it is simply a ReduceLROnPlateu in disguise. Although it would be cleaner to use SequentialLR but is now a pytorch issue rather than PL. Maybe a separate issue can be opened there such that SequentialLR can accept arbitrary arguments in step? |
yeah, I'd suggest opening an issue in PyTorch and linking it here. We won't close this issue. Once they support it, we will make adjustments here, if required, to make it compatible. |
@marcm-ml hello , would you please share the code again as the link is now dead . i am trying my hardest to incorporate warm up with reducelronplateu but no success so far |
Sure but not guarantees that it works with any of the recent PyTorch or PytorhLightning versions. I haven't touched that in years. import math
import warnings
import numpy as np
from torch.optim.lr_scheduler import ReduceLROnPlateau, _LRScheduler
class WarmStartReduceOnPlateau(ReduceLROnPlateau):
def __init__(self,
optimizer,
warm_start: float,
warm_stop: float,
warm_patience: int = 0,
warm_duration: int = 25,
warm_type: str = "linear",
mode: str = "min",
patience: int = 10,
cooldown=0,
factor=0.1,
threshold=1e-4,
threshold_mode='rel',
min_lr=0,
eps=1e-8,
verbose=False):
"""
Workaround class as SequentialLR with ReduceLROnPlateau is not working in pytorch lightning currently.
Otherwise simply use WarmStart class together with any of the other pytorch schedulers.
See Also
https://github.com/PyTorchLightning/pytorch-lightning/issues/10759
"""
assert warm_type in ("linear", "smooth")
assert warm_duration > 0
assert warm_patience >= 0
self.warm_start = warm_start
self.warm_stop = warm_stop
self.warm_patience = warm_patience
self.warm_duration = warm_duration
self.warm_type = warm_type
self.warm_ended = False
self._last_lr = warm_start
super().__init__(
optimizer,
mode=mode,
factor=factor,
patience=patience,
threshold=threshold,
threshold_mode=threshold_mode,
cooldown=cooldown,
min_lr=min_lr,
eps=eps,
verbose=verbose
)
def step(self, metrics, epoch=None):
current = float(metrics)
if epoch is None:
epoch = self.last_epoch + 1
self.last_epoch = epoch
# Check if out of warm-up patience period and if warm-up should end
if self.last_epoch > self.warm_patience and not self.warm_ended:
self._warm_lr(self.last_epoch)
# Check if out of warm-up phase
if self.last_epoch > self.warm_patience + self.warm_duration:
if self.is_better(current, self.best):
self.best = current
self.num_bad_epochs = 0
else:
self.num_bad_epochs += 1
if self.in_cooldown:
self.cooldown_counter -= 1
self.num_bad_epochs = 0 # ignore any bad epochs in cooldown
if self.num_bad_epochs > self.patience:
# Indicate to warm up that LRReduce should happen; prevent LR override
if self.verbose and not self.warm_ended:
print(f"Ending warm-up phase after {epoch} epochs. "
f"Switching over to ReduceLROnPlateau")
self.warm_ended = True
self._reduce_lr(epoch)
self.cooldown_counter = self.cooldown
self.num_bad_epochs = 0
self._last_lr = [group['lr'] for group in self.optimizer.param_groups]
def _warm_lr(self, epoch):
for i, param_group in enumerate(self.optimizer.param_groups):
old_lr = float(param_group['lr'])
slope = (self.warm_stop - self.warm_start)
x = (epoch - self.warm_patience) / self.warm_duration
lower_bound = min(self.warm_start, self.warm_stop)
upper_bound = max(self.warm_start, self.warm_stop)
if self.warm_type == "linear":
new_lr = slope * x + self.warm_start
else:
new_lr = slope * math.tanh(x) + self.warm_start
param_group['lr'] = np.clip(new_lr, lower_bound, upper_bound)
if self.verbose and not np.isclose(old_lr, new_lr):
print('Epoch {:5d}: warming-up learning rate'
' of group {} to {:.4e}.'.format(epoch, i, new_lr)) |
@marcm-ml it's working fantastically . thank you so much. |
🐛 Bug
Currently there is a bug when a ReduceLROnPlateau is used inside SequentialLR due to no proper support for this scheduler in Trainer._configure_schedulers. An exception is raised since the monitor metric is not properly passed to the ReduceLROnPlateau scheduler in TrainingEpochLoop._update_learning_rates
Note: Currently, there is a bug in SequentialLR missing an optimizer attribute, see pytorch/pytorch#67406 and #10278. But that should not interfere here.
To Reproduce
run any lightning model with trainer with scheduler setup like:
Expected behavior
The monitor value should be passed to the underlying ReduceLROnPlateau scheduler.
This is defenitely tricky to achieve as the current way is assuming a fixed scheduler setup for the entire training time, e.g. allows for multiple scheduler but if scheduler1 is changing midway it only works if it is not ReduceLROnPlateau.
Environment
conda
,pip
, source): piptorch.__config__.show()
: -Additional context
cc @tchaton
The text was updated successfully, but these errors were encountered: