Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Message "size mismatch for relation_k_emb.weight" when i'm trying load a training models using t5-small #21

Open
kanseaveg opened this issue Jun 28, 2023 · 0 comments

Comments

@kanseaveg
Copy link

I am rasat running on two consumer-grade graphics cards.The pre-trained model I am implementing is t5-small.And successfully executed the following command: CUDA_VISIBLE_DEVICES="0,1" python3 -m torch.distributed.launch --nnodes=1 --nproc_per_node=2 seq2seq/run_seq2seq.py configs/spider/train_spider_rasat_small.json

tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
208210 ***** eval metrics *****
208211   epoch                   =    3071.95
208212   eval_exact_match        =     0.5348
208213   eval_exec               =     0.5387
208214   eval_loss               =     0.7128
208215   eval_runtime            = 0:02:24.19
208216   eval_samples            =       1034
208217   eval_samples_per_second =      7.171
208218 100% 65/65 [02:22<00:00,  2.20s/it]<__array_function__ internals>:5: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different l

However, when I was evaluating, as mentioned in the train configuration file, I set the evaluation model path to "./experiment/train_spider_rasat_small",.

I encountered an error when executing the evaluation command
python3 seq2seq/eval_run_seq2seq.py configs/spider/eval_spider_rasat_4160.json

The error message is:

Dataset name: spider
Mode: dev
Databases has been preprocessed. Use cache.
Dataset has been preprocessed. Use cache.
Dataset: spider
Mode: dev
Match Questions...
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [00:01<00:00, 606.60it/s]Question match errors: 0/1034
Match Table, Columns, DB Contents...
1034it [00:01, 614.75it/s]
DB match errors: 0/1034
Generate Relations...
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [00:10<00:00, 95.10it/s]Edge match errors: 0/2340638
06/28/2023 20:30:11 - WARNING - datasets.arrow_dataset -   Loading cached processed dataset at ./transformers_cache/spider/spider/1.0.0/a9000e8b37ea883ad113d628d95c9067385cc1105e2641a44bfa3090483dbb9b/cache-21e2b8bdcac7ddca.arrow
===================================================
Num of relations uesd in RASAT is :  45
===================================================
Use relation model.
./experiment/train_spider_rasat_small
Traceback (most recent call last):
  File "seq2seq/eval_run_seq2seq.py", line 320, in <module>
    main()
  File "seq2seq/eval_run_seq2seq.py", line 208, in main
    model = nn.DataParallel(model_cls_wrapper(T5ForConditionalGeneration).from_pretrained(
  File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1453, in from_pretrained
    model, missing_keys, unexpected_keys, mismatched_keys, error_msgs = cls._load_state_dict_into_model(
  File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1607, in _load_state_dict_into_model
    raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for T5ForConditionalGeneration:
        size mismatch for relation_k_emb.weight: copying a param with shape torch.Size([49, 64]) from checkpoint, the shape in current model is torch.Size([46, 64]).
        size mismatch for relation_v_emb.weight: copying a param with shape torch.Size([49, 64]) from checkpoint, the shape in current model is torch.Size([46, 64]).
        size mismatch for encoder.relation_k_emb.weight: copying a param with shape torch.Size([49, 64]) from checkpoint, the shape in current model is torch.Size([46, 64]).
        size mismatch for encoder.relation_v_emb.weight: copying a param with shape torch.Size([49, 64]) from checkpoint, the shape in current model is torch.Size([46, 64]).

wandb: Waiting for W&B process to finish, PID 310089... (failed 1). Press ctrl-c to abort syncing.

Could you please check and see where the error occurred? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant