'str' object has no attribute 'total_seconds' when trying to use 'train_time_interval' in 'checkpoint_callback_params' #9863

AudranBert · 2024-07-24T15:11:06Z

Describe the bug

Currently, it is not possible to use the 'train_time_interval' param from Pytorch lightning for checkpointing. When trying to specify it, it throws the following error.

  File "/home/abert/.local/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 303, in on_train_batch_end
    skip_time = prev_time_check is None or (now - prev_time_check) < train_time_interval.total_seconds()
AttributeError: 'str' object has no attribute 'total_seconds'

And that's because Pytorch lighting 'train_time_interval' takes a timedelta object. Since we can't instantiate an object in the yaml and neither can we set it later (it throws an error saying it is not a primitive type (see below)) we can't use this feature.

Traceback (most recent call last):
  File "/mnt/c/Users/berta/Documents/Linagora/NeMo/examples/asr/speech_to_text_finetune.py", line 192, in main
    cfg.exp_manager.checkpoint_callback_params.train_time_interval = timedelta(seconds=30)
omegaconf.errors.UnsupportedValueType: Value 'timedelta' is not a supported primitive type
    full_key: exp_manager.checkpoint_callback_params.train_time_interval
    object_type=dict

I made a quick and dirty fix on my side by making a timedelta object just before sending it to Pytorch lightning.

Steps/Code to reproduce bug

Running NeMo/examples/asr/speech_to_text_finetune.py with the following changes made to the yaml config file:

exp_manager:
  exp_dir: null
  name: ${name}
  create_tensorboard_logger: true
  create_checkpoint_callback: true
  checkpoint_callback_params:
    # in case of multiple validation sets, first one is used
    monitor: "train_loss"
    mode: "min"
    save_top_k: 5
    always_save_nemo: True # saves the checkpoints as nemo files along with PTL checkpoints
    train_time_interval: 60
    every_n_epochs: null

Expected behavior

We should be able to use this parameter, by specifying a number of seconds for example.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-08-31T01:54:10Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2024-09-08T01:58:47Z

This issue was closed because it has been inactive for 7 days since being marked as stale.

nithinraok · 2024-09-20T20:38:35Z

exp_manager:
    train_time_interval: 
      _target_: time.timedelta
      seconds: 60

You could pass an object throgh config as shown above. As hydra doesn;t support objects natively, current workaround is to use Any in type annotation. Added support for it here: #10559

AudranBert · 2024-09-25T13:42:55Z

exp_manager:
train_time_interval:
target: time.timedelta
seconds: 60

You could pass an object throgh config as shown above. As hydra doesn;t support objects natively, current workaround is to use Any in type annotation. Added support for it here: #10559

Hi,
Thanks a lot for the response, I wasn't aware I could do that, thanks!

AudranBert · 2024-10-07T13:26:47Z

Hi,
sorry for the very late comment. I tried your solution (see under) today and it doesn't work without changes (or maybe I'm doing something wrong?).

exp_manager:
    checkpoint_callback_params:
        train_time_interval: 
           _target_: datetime.timedelta
           seconds: 60

The value of "train_time_interval" is a dict because timedelta object is not instantiated. I tried adding "cfg = instantiate(cfg)" in exp_manager" function (after line 407: "cfg = OmegaConf.merge(schema, cfg)") and it does work, but I don't know if it impacts/breaks anything else. I can open a PR if there is a real problem.

AudranBert added the bug Something isn't working label Jul 24, 2024

elliottnv assigned nithinraok Jul 31, 2024

github-actions bot added the stale label Aug 31, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 8, 2024

nithinraok closed this as completed Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'str' object has no attribute 'total_seconds' when trying to use 'train_time_interval' in 'checkpoint_callback_params' #9863

'str' object has no attribute 'total_seconds' when trying to use 'train_time_interval' in 'checkpoint_callback_params' #9863

AudranBert commented Jul 24, 2024 •

edited

Loading

github-actions bot commented Aug 31, 2024

github-actions bot commented Sep 8, 2024

nithinraok commented Sep 20, 2024

AudranBert commented Sep 25, 2024

AudranBert commented Oct 7, 2024 •

edited

Loading

'str' object has no attribute 'total_seconds' when trying to use 'train_time_interval' in 'checkpoint_callback_params' #9863

'str' object has no attribute 'total_seconds' when trying to use 'train_time_interval' in 'checkpoint_callback_params' #9863

Comments

AudranBert commented Jul 24, 2024 • edited Loading

github-actions bot commented Aug 31, 2024

github-actions bot commented Sep 8, 2024

nithinraok commented Sep 20, 2024

AudranBert commented Sep 25, 2024

AudranBert commented Oct 7, 2024 • edited Loading

AudranBert commented Jul 24, 2024 •

edited

Loading

AudranBert commented Oct 7, 2024 •

edited

Loading