Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error using ckpt when resuming #37

Open
alexKup88 opened this issue Nov 29, 2022 · 1 comment
Open

Error using ckpt when resuming #37

alexKup88 opened this issue Nov 29, 2022 · 1 comment

Comments

@alexKup88
Copy link

Thanks for sharing, I am having this error:

Traceback (most recent call last):
File "train_styleswin.py", line 409, in
generator.load_state_dict(ckpt["g"])
File "/mnt/anaconda3/envs/StyleSwin/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1668, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Generator:
Unexpected key(s) in state_dict: "layers.4.blocks.0.attn_mask2", "layers.4.blocks.0.norm1.style.weight", "layers.4.blocks.0.norm1.style.bias", "layers.4.blocks.0.qkv.weight", "layers.4.blocks.0.qkv.bias", "layers.4.blocks.0.proj.weight", "layers.4 and so on

this is the command I am running:

python -m torch.distributed.launch --nproc_per_node=2 train_styleswin.py --batch 4 --path /mnt/DATASETS/FFHQ --checkpoint_path /mnt/PROCESSEDdata/StyleSwin/Train --sample_path /mnt/PROCESSEDdata/StyleSwin/Train --size 32 --G_channel_multiplier 2 --bcr --D_lr 0.0002 --D_sn --ttur --eval_gt_path /mnt/DATASETS/FFHQ --lr_decay --lr_decay_start_steps 775000 --iter 1000000 --ckpt /mnt/PROCESSEDdata/StyleSwin/FFHQ_1024.pt --use_checkpoint

I tried with and without the use_checkpoint flag and also the 256 version giving back the same error.

Best

@ForeverFancy
Copy link
Contributor

Hi, --use_checkpoint specifies to use PyTorch's checkpointing technique to save GPU memory, while --ckpt /mnt/PROCESSEDdata/StyleSwin/FFHQ_1024.pt specifies to resume from ckpt. It seems that you are training the mode with size 32, but using a higher resolution ckpt (e.g. 512, 1024), which would cause this error. The ckpt could only be used when you training the corresponding resolution model, and when training on other resolution, you should train from scratch without the pre-trained ckpt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants