You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, it is not possible to use the 'train_time_interval' param from Pytorch lightning for checkpointing. When trying to specify it, it throws the following error.
File "/home/abert/.local/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 303, in on_train_batch_end
skip_time = prev_time_check is None or (now - prev_time_check) < train_time_interval.total_seconds()
AttributeError: 'str' object has no attribute 'total_seconds'
And that's because Pytorch lighting 'train_time_interval' takes a timedelta object. Since we can't instantiate an object in the yaml and neither can we set it later (it throws an error saying it is not a primitive type (see below)) we can't use this feature.
Traceback (most recent call last):
File "/mnt/c/Users/berta/Documents/Linagora/NeMo/examples/asr/speech_to_text_finetune.py", line 192, in main
cfg.exp_manager.checkpoint_callback_params.train_time_interval = timedelta(seconds=30)
omegaconf.errors.UnsupportedValueType: Value 'timedelta' is not a supported primitive type
full_key: exp_manager.checkpoint_callback_params.train_time_interval
object_type=dict
I made a quick and dirty fix on my side by making a timedelta object just before sending it to Pytorch lightning.
Steps/Code to reproduce bug
Running NeMo/examples/asr/speech_to_text_finetune.py with the following changes made to the yaml config file:
exp_manager:
exp_dir: null
name: ${name}
create_tensorboard_logger: true
create_checkpoint_callback: true
checkpoint_callback_params:
# in case of multiple validation sets, first one is used
monitor: "train_loss"
mode: "min"
save_top_k: 5
always_save_nemo: True # saves the checkpoints as nemo files along with PTL checkpoints
train_time_interval: 60
every_n_epochs: null
Expected behavior
We should be able to use this parameter, by specifying a number of seconds for example.
The text was updated successfully, but these errors were encountered:
You could pass an object throgh config as shown above. As hydra doesn;t support objects natively, current workaround is to use Any in type annotation. Added support for it here: #10559
You could pass an object throgh config as shown above. As hydra doesn;t support objects natively, current workaround is to use Any in type annotation. Added support for it here: #10559
Hi,
Thanks a lot for the response, I wasn't aware I could do that, thanks!
Hi,
sorry for the very late comment. I tried your solution (see under) today and it doesn't work without changes (or maybe I'm doing something wrong?).
The value of "train_time_interval" is a dict because timedelta object is not instantiated. I tried adding "cfg = instantiate(cfg)" in exp_manager" function (after line 407: "cfg = OmegaConf.merge(schema, cfg)") and it does work, but I don't know if it impacts/breaks anything else. I can open a PR if there is a real problem.
Describe the bug
Currently, it is not possible to use the 'train_time_interval' param from Pytorch lightning for checkpointing. When trying to specify it, it throws the following error.
And that's because Pytorch lighting 'train_time_interval' takes a timedelta object. Since we can't instantiate an object in the yaml and neither can we set it later (it throws an error saying it is not a primitive type (see below)) we can't use this feature.
I made a quick and dirty fix on my side by making a timedelta object just before sending it to Pytorch lightning.
Steps/Code to reproduce bug
Running NeMo/examples/asr/speech_to_text_finetune.py with the following changes made to the yaml config file:
Expected behavior
We should be able to use this parameter, by specifying a number of seconds for example.
The text was updated successfully, but these errors were encountered: