Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Error(s) in loading state_dict for GPT2Model: #76

Open
Afeihan opened this issue Aug 23, 2022 · 1 comment
Open

RuntimeError: Error(s) in loading state_dict for GPT2Model: #76

Afeihan opened this issue Aug 23, 2022 · 1 comment

Comments

@Afeihan
Copy link

Afeihan commented Aug 23, 2022

Centos系统中,安装apex和deepspeed等依赖包
运行目录为项目根目录,
预训练模型,存储根目录:80000/80000/mp_rank_00_model_states.pt

运行:
python generate_samples.py --model-parallel-size 2 --num-layers 32 --hidden-size 2560 --load ./80000 --num-attention-heads 32 --seq-length 1024 --max-position-embeddings 1024 --fp16 --cache-dir cache --out-seq-length 512 --temperature 0.9 --top_k 0 --top_p 0 --tokenizer-path bpe_3w_new/ --vocab-size 30000 --input-text example.txt
报错如下:
Generate Samples
WARNING: No training data specified
using world size: 1 and model-parallel size: 1

using dynamic loss scaling
/home/troila/anaconda3/envs/test/lib/python3.7/site-packages/torch/cuda/init.py:146: UserWarning:
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

initializing model parallel with size 1
initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
building CPM model ...
number of parameters on model parallel rank 0: 2597073920
global rank 0 is loading checkpoint ./80000/80000/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "generate_samples.py", line 384, in
main()
File "generate_samples.py", line 374, in main
model = setup_model(args)
File "generate_samples.py", line 345, in setup_model
args.iteration = load_checkpoint_model(model, args)
File "/home/hanlifei/CPM-Generate/utils.py", line 290, in load_checkpoint_model
model.load_state_dict(sd['module'])
File "/home/hanlifei/CPM-Generate/model/distributed.py", line 90, in load_state_dict
self.module.load_state_dict(state_dict, strict=strict)
File "/home/hanlifei/CPM-Generate/fp16/fp16.py", line 71, in load_state_dict
self.module.load_state_dict(state_dict, strict=strict)
File "/home/troila/anaconda3/envs/test/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1605, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for GPT2Model:
size mismatch for word_embeddings.weight: copying a param with shape torch.Size([15000, 2560]) from checkpoint, the shape in current model is torch.Size([30000, 2560]).
size mismatch for transformer.layers.0.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.0.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.0.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.0.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.0.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.0.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.1.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.1.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.1.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.1.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.1.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.1.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.2.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.2.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.2.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.2.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.2.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.2.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.3.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.3.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.3.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.3.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.3.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.3.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.4.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.4.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.4.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.4.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.4.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.4.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.5.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.5.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.5.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.5.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.5.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.5.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.6.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.6.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.6.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.6.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.6.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.6.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.7.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.7.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.7.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.7.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.7.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.7.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.8.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.8.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.8.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.8.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.8.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.8.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.9.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.9.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.9.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.9.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.9.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.9.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.10.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.10.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.10.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.10.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.10.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.10.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.11.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.11.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.11.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.11.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.11.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.11.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.12.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.12.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.12.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.12.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.12.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.12.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.13.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.13.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.13.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.13.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.13.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.13.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.14.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.14.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.14.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.14.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.14.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.14.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.15.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.15.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.15.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.15.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.15.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.15.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.16.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.16.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.16.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.16.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.16.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.16.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.17.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.17.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.17.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.17.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.17.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.17.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.18.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.18.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.18.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.18.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.18.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.18.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.19.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.19.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.19.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.19.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.19.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.19.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.20.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.20.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.20.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.20.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.20.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.20.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.21.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.21.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.21.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.21.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.21.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.21.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.22.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.22.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.22.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.22.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.22.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.22.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.23.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.23.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.23.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.23.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.23.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.23.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.24.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.24.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.24.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.24.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.24.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.24.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.25.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.25.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.25.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.25.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.25.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.25.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.26.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.26.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.26.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.26.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.26.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.26.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.27.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.27.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.27.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.27.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.27.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.27.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.28.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.28.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.28.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.28.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.28.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.28.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.29.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.29.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.29.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.29.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.29.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.29.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.30.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.30.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.30.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.30.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.30.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.30.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.31.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.31.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.31.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.31.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.31.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.31.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).

@judynlp
Copy link

judynlp commented Mar 13, 2023

请问你这个问题解决了吗?我也遇到这个问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants