Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Translate Error #65

Closed
qtxue opened this issue Apr 25, 2019 · 9 comments
Closed

Translate Error #65

qtxue opened this issue Apr 25, 2019 · 9 comments

Comments

@qtxue
Copy link

qtxue commented Apr 25, 2019

hello,thanks for your code,
I run the train.sh as you writed in README, the command is:

CUDA_VISIBLE_DEVICES=0 nohup > nohup_3.log 2>&1 python3 train.py \
--exp_name test_enfr_mlm \
--dump_path ./dumped2/ \
--data_path ./data/processed/en-fr/ \
--lgs 'en-fr' \
--clm_steps '' \
--mlm_steps 'en,fr' \
--emb_dim 512 \
--n_layers 4 \
--n_heads 8 \
--dropout 0.1 \
--attention_dropout 0.1 \
--gelu_activation true \
--batch_size 32 \
--bptt 256 \
--optimizer adam,lr=0.0001 \
--epoch_size 200000 \
--validation_metrics _valid_mlm_ppl \
--stopping_criterion _valid_mlm_ppl,3 &

and then,I want to use the saved model(The program has not finished running,the model is saved during the running of the program) to translate some sentences, the command is:

head -n 10 /home/qtxue/dqxu/data/para/dev/newstest2014-fren-src.fr.60000 | \
CUDA_VISIBLE_DEVICES=4 python3 translate.py --exp_name translate \
--src_lang fr --tgt_lang en \
--model_path /home/qtxue/best-valid_mlm_ppl.pth --output_path /home/qtxue/output.en

some error appears:
INFO - 04/25/19 10:48:43 - 0:00:00 - ============ Initialized logger ============
INFO - 04/25/19 10:48:43 - 0:00:00 - batch_size: 32
command: python translate.py --exp_name translate --src_lang fr --tgt_lang en --model_path '/home/qtxue/checkpoint.pth' --output_path '/home/qtxue/output.en' --exp_id "19njy282kc"
dump_path: ./dumped/translate/19njy282kc
exp_id: 19njy282kc
exp_name: translate
fp16: False
model_path: /home/qtxue/checkpoint.pth
output_path: /home/qtxue/output.en
src_lang: fr
tgt_lang: en
INFO - 04/25/19 10:48:43 - 0:00:00 - The experiment will be stored in ./dumped/translate/19njy282kc

INFO - 04/25/19 10:48:43 - 0:00:00 - Running command: python translate.py --exp_name translate --src_lang fr --tgt_lang en --model_path '/home/qtxue/checkpoint.pth' --output_path '/home/qtxue/output.en'

INFO - 04/25/19 10:48:48 - 0:00:05 - Supported languages: en, fr
Traceback (most recent call last):
File "translate.py", line 150, in
main(params)
File "translate.py", line 80, in main
encoder.load_state_dict(reloaded['encoder'])
KeyError: 'encoder'
is there some place need to modify or some thing is wrong in my operation?

@qtxue
Copy link
Author

qtxue commented Apr 25, 2019

sorry, I made a naive mistake.

@glample
Copy link
Contributor

glample commented Apr 25, 2019

What was the mistake? Did you figure it out?

@odel-odel
Copy link

odel-odel commented May 6, 2019

Hi,
I get the same error , can you help me save this ?
In the command i loaded "best-valid_en-fr_mt_bleu.pth" for model file

Traceback (most recent call last):
File "translate.py", line 150, in
main(params)
File "translate.py", line 80, in main
encoder.load_state_dict(reloaded['encoder'])
File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TransformerModel:
Missing key(s) in state_dict: "pred_layer.proj.weight", "pred_layer.proj.bias".

@glample
Copy link
Contributor

glample commented May 6, 2019

This means that you are trying to reload a component that requires to have an output layer, when the reloaded model does not have any output layer. Can you try to set with_output=False for the encoder here: https://github.com/facebookresearch/XLM/blob/master/src/model/__init__.py#L127 and see if this helps?

@odel-odel
Copy link

odel-odel commented May 7, 2019

Hi,
I have already set this parameter to False in the init file...
in translate.py there is another loading command of the model

I will paste the full error :

Traceback (most recent call last):
File "translate.py", line 153, in
main(params)
File "translate.py", line 80, in main
encoder.load_state_dict(reloaded['encoder'])
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TransformerModel:
Missing key(s) in state_dict: "position_embeddings.weight", "lang_embeddings.weight", "embeddings.weight", "layer_norm_emb.bias", "layer_norm_emb.weight", "attentions.0.q_lin.bias", "attentions.0.q_lin.weight", "attentions.0.k_lin.bias", "attentions.0.k_lin.weight", "attentions.0.v_lin.bias", "attentions.0.v_lin.weight", "attentions.0.out_lin.bias", "attentions.0.out_lin.weight", "attentions.1.q_lin.bias", "attentions.1.q_lin.weight", "attentions.1.k_lin.bias", "attentions.1.k_lin.weight", "attentions.1.v_lin.bias", "attentions.1.v_lin.weight", "attentions.1.out_lin.bias", "attentions.1.out_lin.weight", "attentions.2.q_lin.bias", "attentions.2.q_lin.weight", "attentions.2.k_lin.bias", "attentions.2.k_lin.weight", "attentions.2.v_lin.bias", "attentions.2.v_lin.weight", "attentions.2.out_lin.bias", "attentions.2.out_lin.weight", "attentions.3.q_lin.bias", "attentions.3.q_lin.weight", "attentions.3.k_lin.bias", "attentions.3.k_lin.weight", "attentions.3.v_lin.bias", "attentions.3.v_lin.weight", "attentions.3.out_lin.bias", "attentions.3.out_lin.weight", "attentions.4.q_lin.bias", "attentions.4.q_lin.weight", "attentions.4.k_lin.bias", "attentions.4.k_lin.weight", "attentions.4.v_lin.bias", "attentions.4.v_lin.weight", "attentions.4.out_lin.bias", "attentions.4.out_lin.weight", "attentions.5.q_lin.bias", "attentions.5.q_lin.weight", "attentions.5.k_lin.bias", "attentions.5.k_lin.weight", "attentions.5.v_lin.bias", "attentions.5.v_lin.weight", "attentions.5.out_lin.bias", "attentions.5.out_lin.weight", "layer_norm1.0.bias", "layer_norm1.0.weight", "layer_norm1.1.bias", "layer_norm1.1.weight", "layer_norm1.2.bias", "layer_norm1.2.weight", "layer_norm1.3.bias", "layer_norm1.3.weight", "layer_norm1.4.bias", "layer_norm1.4.weight", "layer_norm1.5.bias", "layer_norm1.5.weight", "ffns.0.lin1.bias", "ffns.0.lin1.weight", "ffns.0.lin2.bias", "ffns.0.lin2.weight", "ffns.1.lin1.bias", "ffns.1.lin1.weight", "ffns.1.lin2.bias", "ffns.1.lin2.weight", "ffns.2.lin1.bias", "ffns.2.lin1.weight", "ffns.2.lin2.bias", "ffns.2.lin2.weight", "ffns.3.lin1.bias", "ffns.3.lin1.weight", "ffns.3.lin2.bias", "ffns.3.lin2.weight", "ffns.4.lin1.bias", "ffns.4.lin1.weight", "ffns.4.lin2.bias", "ffns.4.lin2.weight", "ffns.5.lin1.bias", "ffns.5.lin1.weight", "ffns.5.lin2.bias", "ffns.5.lin2.weight", "layer_norm2.0.bias", "layer_norm2.0.weight", "layer_norm2.1.bias", "layer_norm2.1.weight", "layer_norm2.2.bias", "layer_norm2.2.weight", "layer_norm2.3.bias", "layer_norm2.3.weight", "layer_norm2.4.bias", "layer_norm2.4.weight", "layer_norm2.5.bias", "layer_norm2.5.weight", "pred_layer.proj.bias", "pred_layer.proj.weight".
Unexpected key(s) in state_dict: "module.position_embeddings.weight", "module.lang_embeddings.weight", "module.embeddings.weight", "module.layer_norm_emb.weight", "module.layer_norm_emb.bias", "module.attentions.0.q_lin.weight", "module.attentions.0.q_lin.bias", "module.attentions.0.k_lin.weight", "module.attentions.0.k_lin.bias", "module.attentions.0.v_lin.weight", "module.attentions.0.v_lin.bias", "module.attentions.0.out_lin.weight", "module.attentions.0.out_lin.bias", "module.attentions.1.q_lin.weight", "module.attentions.1.q_lin.bias", "module.attentions.1.k_lin.weight", "module.attentions.1.k_lin.bias", "module.attentions.1.v_lin.weight", "module.attentions.1.v_lin.bias", "module.attentions.1.out_lin.weight", "module.attentions.1.out_lin.bias", "module.attentions.2.q_lin.weight", "module.attentions.2.q_lin.bias", "module.attentions.2.k_lin.weight", "module.attentions.2.k_lin.bias", "module.attentions.2.v_lin.weight", "module.attentions.2.v_lin.bias", "module.attentions.2.out_lin.weight", "module.attentions.2.out_lin.bias", "module.attentions.3.q_lin.weight", "module.attentions.3.q_lin.bias", "module.attentions.3.k_lin.weight", "module.attentions.3.k_lin.bias", "module.attentions.3.v_lin.weight", "module.attentions.3.v_lin.bias", "module.attentions.3.out_lin.weight", "module.attentions.3.out_lin.bias", "module.attentions.4.q_lin.weight", "module.attentions.4.q_lin.bias", "module.attentions.4.k_lin.weight", "module.attentions.4.k_lin.bias", "module.attentions.4.v_lin.weight", "module.attentions.4.v_lin.bias", "module.attentions.4.out_lin.weight", "module.attentions.4.out_lin.bias", "module.attentions.5.q_lin.weight", "module.attentions.5.q_lin.bias", "module.attentions.5.k_lin.weight", "module.attentions.5.k_lin.bias", "module.attentions.5.v_lin.weight", "module.attentions.5.v_lin.bias", "module.attentions.5.out_lin.weight", "module.attentions.5.out_lin.bias", "module.layer_norm1.0.weight", "module.layer_norm1.0.bias", "module.layer_norm1.1.weight", "module.layer_norm1.1.bias", "module.layer_norm1.2.weight", "module.layer_norm1.2.bias", "module.layer_norm1.3.weight", "module.layer_norm1.3.bias", "module.layer_norm1.4.weight", "module.layer_norm1.4.bias", "module.layer_norm1.5.weight", "module.layer_norm1.5.bias", "module.ffns.0.lin1.weight", "module.ffns.0.lin1.bias", "module.ffns.0.lin2.weight", "module.ffns.0.lin2.bias", "module.ffns.1.lin1.weight", "module.ffns.1.lin1.bias", "module.ffns.1.lin2.weight", "module.ffns.1.lin2.bias", "module.ffns.2.lin1.weight", "module.ffns.2.lin1.bias", "module.ffns.2.lin2.weight", "module.ffns.2.lin2.bias", "module.ffns.3.lin1.weight", "module.ffns.3.lin1.bias", "module.ffns.3.lin2.weight", "module.ffns.3.lin2.bias", "module.ffns.4.lin1.weight", "module.ffns.4.lin1.bias", "module.ffns.4.lin2.weight", "module.ffns.4.lin2.bias", "module.ffns.5.lin1.weight", "module.ffns.5.lin1.bias", "module.ffns.5.lin2.weight", "module.ffns.5.lin2.bias", "module.layer_norm2.0.weight", "module.layer_norm2.0.bias", "module.layer_norm2.1.weight", "module.layer_norm2.1.bias", "module.layer_norm2.2.weight", "module.layer_norm2.2.bias", "module.layer_norm2.3.weight", "module.layer_norm2.3.bias", "module.layer_norm2.4.weight", "module.layer_norm2.4.bias", "module.layer_norm2.5.weight", "module.layer_norm2.5.bias".

@odel-odel
Copy link

odel-odel commented May 7, 2019

I try to change line 80 in translate.py to

if all([k.startswith('module.') for k in reloaded['encoder'].keys()]):
enc_reload = {k[len('module.'):]: v for k, v in reloaded['encoder'].items()}
encoder.load_state_dict(enc_reload)
else:
encoder.load_state_dict(reloaded['encoder'])

And now , I get , again, shorter error ;
Traceback (most recent call last):
File "translate.py", line 156, in
main(params)
File "translate.py", line 85, in main
encoder.load_state_dict(enc_reload)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TransformerModel:
Missing key(s) in state_dict: "pred_layer.proj.bias", "pred_layer.proj.weight".

@glample
Copy link
Contributor

glample commented May 7, 2019

Indeed, the problem is that you are trying to reload a checkpoint I think (because the .module are there everywhere). This part of the code takes care of it: https://github.com/facebookresearch/XLM/blob/master/src/model/__init__.py#L155-L156 but this is separated from the translate script. What you did is something similar that should take care of it.

Did you also set with_output=False here? https://github.com/facebookresearch/XLM/blob/master/translate.py#L78

@odel-odel
Copy link

Now it works , Thanks !!!

@glample glample closed this as completed May 8, 2019
@tuyu95
Copy link

tuyu95 commented Jul 25, 2019

sorry, I made a naive mistake.

I meet the same problem with you. I use the pretrained model and then get the same error. Would you tell me your solution?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants