Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'str' object has no attribute 'total_seconds' when trying to use 'train_time_interval' in 'checkpoint_callback_params' #9863

Closed
AudranBert opened this issue Jul 24, 2024 · 5 comments
Assignees
Labels
bug Something isn't working stale

Comments

@AudranBert
Copy link

AudranBert commented Jul 24, 2024

Describe the bug

Currently, it is not possible to use the 'train_time_interval' param from Pytorch lightning for checkpointing. When trying to specify it, it throws the following error.

  File "/home/abert/.local/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 303, in on_train_batch_end
    skip_time = prev_time_check is None or (now - prev_time_check) < train_time_interval.total_seconds()
AttributeError: 'str' object has no attribute 'total_seconds'

And that's because Pytorch lighting 'train_time_interval' takes a timedelta object. Since we can't instantiate an object in the yaml and neither can we set it later (it throws an error saying it is not a primitive type (see below)) we can't use this feature.

Traceback (most recent call last):
  File "/mnt/c/Users/berta/Documents/Linagora/NeMo/examples/asr/speech_to_text_finetune.py", line 192, in main
    cfg.exp_manager.checkpoint_callback_params.train_time_interval = timedelta(seconds=30)
omegaconf.errors.UnsupportedValueType: Value 'timedelta' is not a supported primitive type
    full_key: exp_manager.checkpoint_callback_params.train_time_interval
    object_type=dict

I made a quick and dirty fix on my side by making a timedelta object just before sending it to Pytorch lightning.

Steps/Code to reproduce bug

Running NeMo/examples/asr/speech_to_text_finetune.py with the following changes made to the yaml config file:

exp_manager:
  exp_dir: null
  name: ${name}
  create_tensorboard_logger: true
  create_checkpoint_callback: true
  checkpoint_callback_params:
    # in case of multiple validation sets, first one is used
    monitor: "train_loss"
    mode: "min"
    save_top_k: 5
    always_save_nemo: True # saves the checkpoints as nemo files along with PTL checkpoints
    train_time_interval: 60
    every_n_epochs: null

Expected behavior

We should be able to use this parameter, by specifying a number of seconds for example.

@AudranBert AudranBert added the bug Something isn't working label Jul 24, 2024
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Aug 31, 2024
Copy link
Contributor

github-actions bot commented Sep 8, 2024

This issue was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 8, 2024
@nithinraok
Copy link
Collaborator

exp_manager:
    train_time_interval: 
      _target_: time.timedelta
      seconds: 60

You could pass an object throgh config as shown above. As hydra doesn;t support objects natively, current workaround is to use Any in type annotation. Added support for it here: #10559

@AudranBert
Copy link
Author

exp_manager:
train_time_interval:
target: time.timedelta
seconds: 60

You could pass an object throgh config as shown above. As hydra doesn;t support objects natively, current workaround is to use Any in type annotation. Added support for it here: #10559

Hi,
Thanks a lot for the response, I wasn't aware I could do that, thanks!

@AudranBert
Copy link
Author

AudranBert commented Oct 7, 2024

Hi,
sorry for the very late comment. I tried your solution (see under) today and it doesn't work without changes (or maybe I'm doing something wrong?).

exp_manager:
    checkpoint_callback_params:
        train_time_interval: 
           _target_: datetime.timedelta
           seconds: 60

The value of "train_time_interval" is a dict because timedelta object is not instantiated. I tried adding "cfg = instantiate(cfg)" in exp_manager" function (after line 407: "cfg = OmegaConf.merge(schema, cfg)") and it does work, but I don't know if it impacts/breaks anything else. I can open a PR if there is a real problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

2 participants