Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper support for Pytorch SequentialLR Scheduler #10759

Open
marcm-ml opened this issue Nov 25, 2021 · 9 comments
Open

Proper support for Pytorch SequentialLR Scheduler #10759

marcm-ml opened this issue Nov 25, 2021 · 9 comments
Labels
3rd party Related to a 3rd-party bug Something isn't working lr scheduler
Milestone

Comments

@marcm-ml
Copy link
Contributor

marcm-ml commented Nov 25, 2021

🐛 Bug

Currently there is a bug when a ReduceLROnPlateau is used inside SequentialLR due to no proper support for this scheduler in Trainer._configure_schedulers. An exception is raised since the monitor metric is not properly passed to the ReduceLROnPlateau scheduler in TrainingEpochLoop._update_learning_rates

Note: Currently, there is a bug in SequentialLR missing an optimizer attribute, see pytorch/pytorch#67406 and #10278. But that should not interfere here.

To Reproduce

run any lightning model with trainer with scheduler setup like:

def configure_optimizers(self):
    optimizer = torch.optim.Adam(self.parameters(), lr=0.01)
    s1= torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer)
    s2= torch.optim.lr_scheduler.ConstantLR(optimizer)
    scheduler = torch.optim.lr_scheduler.SequentialLR(
        optimizer,
        schedulers=[s1, s2],
        milestones=[2]
    )
    scheduler.optimizer = optimizer  # dirty fix for bug in SequentialLR
    return {"optimizer": optimizer,
            "lr_scheduler": scheduler,
            "monitor": "loss"}

Expected behavior

The monitor value should be passed to the underlying ReduceLROnPlateau scheduler.

This is defenitely tricky to achieve as the current way is assuming a fixed scheduler setup for the entire training time, e.g. allows for multiple scheduler but if scheduler1 is changing midway it only works if it is not ReduceLROnPlateau.

Environment

  • PyTorch Lightning Version (e.g., 1.5.0): 1.5.3
  • PyTorch Version (e.g., 1.10): 1.10
  • Python version (e.g., 3.9): 3.9
  • OS (e.g., Linux): Windows
  • CUDA/cuDNN version: 10.2
  • GPU models and configuration: -
  • How you installed PyTorch (conda, pip, source): pip
  • If compiling from source, the output of torch.__config__.show(): -
  • Any other relevant information: -

Additional context

cc @tchaton

@marcm-ml marcm-ml added the bug Something isn't working label Nov 25, 2021
@marcm-ml marcm-ml changed the title Proper support for SequentialLR Proper support for Pytorch SequentialLR Scheduler Nov 25, 2021
@marcm-ml
Copy link
Contributor Author

marcm-ml commented Nov 25, 2021

Related to this but for another issue: There is no support for custom LRScheduler afaik unless they are inherited from _LRScheduler AND require no extra arguments when .step() is called.

In vanilla pytorch this is "solved" as the user is calling .step() directly.

Perhaps the newly introduced customizable loops is a solution for this?

@tchaton tchaton added lr scheduler priority: 1 Medium priority task labels Nov 25, 2021
@tchaton tchaton self-assigned this Nov 25, 2021
@awaelchli
Copy link
Contributor

Perhaps the newly introduced customizable loops is a solution for this?

I would rather go with "Manual Optimization" before going for custom loops.

In manual optimization mode, the user calls the scheduler step manually, hence you can use a custom one and pass args to its step method. How does that sound for a (temporary?) workaround?

@marcm-ml
Copy link
Contributor Author

marcm-ml commented Nov 25, 2021

great idea, i forgot about that. I guess that solves the custom scheduler issue. But I still think the support for SequentialLR is necessary. I think there is a debate on Pytorch whether the schedule class needs a rewrite, so perhaps one waits for that?

@rohitgr7
Copy link
Contributor

For the main issue regarding ReduceLROnPlateau, SequentialLR doesn't support an additional argument inside .step so we really can't do anything from our side.
https://github.com/pytorch/pytorch/blob/5fdcc20d8d96a6b42387f57c2ce331516ad94228/torch/optim/lr_scheduler.py#L628

Related to this but for another issue: There is no support for custom LRScheduler afaik unless they are inherited from _LRScheduler AND require no extra arguments when .step() is called.

Do you have any scheduler that is inherited from _LRScheduler and requires an additional argument inside the .step method? I have a PR for it: #10249, that will solve the issue but seems to be blocked under the decision whether to support outside schedulers or not which are not inherited from PT _LRScheduler.

@marcm-ml
Copy link
Contributor Author

marcm-ml commented Nov 26, 2021

Well, I wanted to use WarmStart (custom scheduler inherited from _LRScheduler) together with ReduceLROnPlateu. First try was using SequentialLR but as outlined above is impossible to use with ReduceLROnPlateu. Then I simply inherited from ReduceLROnPlateu and modified its step function to include the logic from WarmStart. See here for the code. That works now with PytorchLightning as it is simply a ReduceLROnPlateu in disguise. Although it would be cleaner to use SequentialLR but is now a pytorch issue rather than PL. Maybe a separate issue can be opened there such that SequentialLR can accept arbitrary arguments in step?

@rohitgr7
Copy link
Contributor

yeah, I'd suggest opening an issue in PyTorch and linking it here. We won't close this issue. Once they support it, we will make adjustments here, if required, to make it compatible.

@abbas695
Copy link

abbas695 commented May 18, 2024

@marcm-ml hello , would you please share the code again as the link is now dead . i am trying my hardest to incorporate warm up with reducelronplateu but no success so far

@marcm-ml
Copy link
Contributor Author

marcm-ml commented May 18, 2024

Sure but not guarantees that it works with any of the recent PyTorch or PytorhLightning versions. I haven't touched that in years.

import math
import warnings

import numpy as np
from torch.optim.lr_scheduler import ReduceLROnPlateau, _LRScheduler


class WarmStartReduceOnPlateau(ReduceLROnPlateau):
    def __init__(self,
                 optimizer,
                 warm_start: float,
                 warm_stop: float,
                 warm_patience: int = 0,
                 warm_duration: int = 25,
                 warm_type: str = "linear",
                 mode: str = "min",
                 patience: int = 10,
                 cooldown=0,
                 factor=0.1,
                 threshold=1e-4,
                 threshold_mode='rel',
                 min_lr=0,
                 eps=1e-8,
                 verbose=False):
        """
        Workaround class as SequentialLR with ReduceLROnPlateau is not working in pytorch lightning currently.
        Otherwise simply use WarmStart class together with any of the other pytorch schedulers.

        See Also
        https://github.com/PyTorchLightning/pytorch-lightning/issues/10759
        """
        assert warm_type in ("linear", "smooth")
        assert warm_duration > 0
        assert warm_patience >= 0

        self.warm_start = warm_start
        self.warm_stop = warm_stop
        self.warm_patience = warm_patience
        self.warm_duration = warm_duration
        self.warm_type = warm_type
        self.warm_ended = False
        self._last_lr = warm_start

        super().__init__(
            optimizer,
            mode=mode,
            factor=factor,
            patience=patience,
            threshold=threshold,
            threshold_mode=threshold_mode,
            cooldown=cooldown,
            min_lr=min_lr,
            eps=eps,
            verbose=verbose
        )

    def step(self, metrics, epoch=None):
        current = float(metrics)
        if epoch is None:
            epoch = self.last_epoch + 1
        self.last_epoch = epoch

        # Check if out of warm-up patience period and if warm-up should end
        if self.last_epoch > self.warm_patience and not self.warm_ended:
            self._warm_lr(self.last_epoch)

        # Check if out of warm-up phase
        if self.last_epoch > self.warm_patience + self.warm_duration:
            if self.is_better(current, self.best):
                self.best = current
                self.num_bad_epochs = 0
            else:
                self.num_bad_epochs += 1

            if self.in_cooldown:
                self.cooldown_counter -= 1
                self.num_bad_epochs = 0  # ignore any bad epochs in cooldown

            if self.num_bad_epochs > self.patience:
                # Indicate to warm up that LRReduce should happen; prevent LR override
                if self.verbose and not self.warm_ended:
                    print(f"Ending warm-up phase after {epoch} epochs. "
                          f"Switching over to ReduceLROnPlateau")
                self.warm_ended = True

                self._reduce_lr(epoch)
                self.cooldown_counter = self.cooldown
                self.num_bad_epochs = 0

        self._last_lr = [group['lr'] for group in self.optimizer.param_groups]

    def _warm_lr(self, epoch):
        for i, param_group in enumerate(self.optimizer.param_groups):
            old_lr = float(param_group['lr'])
            slope = (self.warm_stop - self.warm_start)
            x = (epoch - self.warm_patience) / self.warm_duration
            lower_bound = min(self.warm_start, self.warm_stop)
            upper_bound = max(self.warm_start, self.warm_stop)
            if self.warm_type == "linear":
                new_lr = slope * x + self.warm_start
            else:
                new_lr = slope * math.tanh(x) + self.warm_start
            param_group['lr'] = np.clip(new_lr, lower_bound, upper_bound)
            if self.verbose and not np.isclose(old_lr, new_lr):
                print('Epoch {:5d}: warming-up learning rate'
                      ' of group {} to {:.4e}.'.format(epoch, i, new_lr))

@abbas695
Copy link

@marcm-ml it's working fantastically . thank you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party Related to a 3rd-party bug Something isn't working lr scheduler
Projects
None yet
Development

No branches or pull requests

6 participants