Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

X-UMX: lr_scheduler_step() missing 1 required positional argument: 'metric' #664

Closed
DavidDiazGuerra opened this issue May 9, 2023 · 4 comments

Comments

@DavidDiazGuerra
Copy link
Contributor

Hello,

I'm trying to use the receipt for training the X-UMX model on the musdb18 dataset but the following exception raises after finishing the validation of the first epoch:

Traceback (most recent call last):
  File "/opt/tuni/worktmp/repos/asteroid/egs/musdb18/X-UMX/train.py", line 498, in <module>
    main(arg_dic, plain_args)
  File "/opt/tuni/worktmp/repos/asteroid/egs/musdb18/X-UMX/train.py", line 465, in main
    trainer.fit(system)
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 520, in fit
    call._call_and_handle_interrupt(
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 92, in launch
    return function(*args, **kwargs)
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 559, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 935, in _run
    results = self._run_stage()
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 978, in _run_stage
    self.fit_loop.run()
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run
    self.on_advance_end()
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 377, in on_advance_end
    self.epoch_loop.update_lr_schedulers("epoch", update_plateau_schedulers=not self.restarting)
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 300, in update_lr_schedulers
    self._update_learning_rates(interval=interval, update_plateau_schedulers=update_plateau_schedulers)
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 350, in _update_learning_rates
    call._call_lightning_module_hook(
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 142, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
TypeError: lr_scheduler_step() missing 1 required positional argument: 'metric'

I am using Pytorch 2.0.0 and Lightning 2.0.2. In order to make the model work with Pytorch 2.0.0 I've made some minor changes in the X-UMX model (see #662) and I've reduced the batch size to make the model fit in my local GPU, but I don't think I've made any changes that should affect the Lightning training.

Maybe this could be related to be using recent versions of Pytorch and Lightning? I have no experience working with Lightning.

Best regards,
David

@DavidDiazGuerra
Copy link
Contributor Author

Hi again,

The problem disappeared after downgrading Lightning to 1.9.4. I guess there are some changes in the 2.0 version that breaks the X-UMX training routine of Asteroid.

Best,
David

@LeonieBorne
Copy link
Contributor

Hello 👋

I had the same problem, it seems that the function definition is def lr_scheduler_step(self, scheduler, metric) in version 2.0 instead of def lr_scheduler_step(self, scheduler, optimizer_idx, metric) in previous versions (see the custom learning rate schedulers documentation here).

@mpariente Should we change this and update the requirements to pytorch-lightning>=2.0.0?

@mpariente
Copy link
Collaborator

I'm not sure this code has an easy backward compatible fix. So we should

  • First change requirements to limit lightning to 1.x versions, and release asteroid 0.6.1,
  • Then fix the issue and upgrade to pytorch-lightning>=2.0.0 with asteroid 0.7.0.

@DavidDiazGuerra
Copy link
Contributor Author

I'm closing this issue since it has been fixed with #682 and the release 0.7.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants