checkpointing not saving model weights if calling `accelerator.prepare_model` instead of `accelerator.prepare` #555

csarron · 2022-07-22T16:42:33Z

System Info

- `Accelerate` version: 0.11.0
- Platform: Linux-4.15.0-161-generic-x86_64-with-glibc2.10
- Python version: 3.8.12
- Numpy version: 1.23.1
- PyTorch version (GPU?): 1.12.0+cu113 (True)
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: MULTI_GPU
        - mixed_precision: no
        - use_cpu: False
        - num_processes: 4
        - machine_rank: 0
        - num_machines: 1
        - main_process_ip: None
        - main_process_port: None
        - main_training_function: main
        - deepspeed_config: {}
        - fsdp_config: {}

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

if we separately prepare the model (i.e. model = accelerator.prepare_model(model)) instead of preparing all at once (model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare(model, optimizer, train_dataloader, eval_dataloader, lr_scheduler)), accelerator won't save model weights using the accelerator.save_states

possible problem: accelerator internally use _prepare_one in the prepare(self, *args) method to append the model to self._models, but the prepare_model(self, model) does not append the model, see here. However, accelerator.save_states depends on self._models to save model weights

Expected behavior

accelerator.save_states should also save model weights even if one separately calls `prepare_model` method, this is possibly a bug in `accelerator.prepare_model(model)`

The text was updated successfully, but these errors were encountered:

* fix: saving model weights checkpointing not saving model weights if calling `accelerator.prepare_model` instead of `accelerator.prepare` resolves issue: #555 * fix: saveing model weights for optimizer and scheduler

csarron added the bug Something isn't working label Jul 22, 2022

csarron changed the title ~~checkpointing not saving model states~~ checkpointing not saving model weights if calling accelerator.prepare_model instead of accelerator.prepare Jul 22, 2022

csarron mentioned this issue Jul 22, 2022

fix: saving model weights #556

Merged

csarron closed this as completed Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

checkpointing not saving model weights if calling `accelerator.prepare_model` instead of `accelerator.prepare` #555

checkpointing not saving model weights if calling `accelerator.prepare_model` instead of `accelerator.prepare` #555

csarron commented Jul 22, 2022

checkpointing not saving model weights if calling accelerator.prepare_model instead of accelerator.prepare #555

checkpointing not saving model weights if calling accelerator.prepare_model instead of accelerator.prepare #555

Comments

csarron commented Jul 22, 2022

System Info

Information

Tasks

Reproduction

Expected behavior

checkpointing not saving model weights if calling `accelerator.prepare_model` instead of `accelerator.prepare` #555

checkpointing not saving model weights if calling `accelerator.prepare_model` instead of `accelerator.prepare` #555