AssertionError in training unsupervised MT #201

lihongzheng-nlp · 2019-09-16T07:30:03Z

Following the instruction, I trained unsupervised MT as follow:
python train.py --exp_name unsupMT_zh-en --dump_path ./dumped/ --reload_model 'best-valid_mlm_ppl.pth,best-valid_mlm_ppl.pth' --data_path path/to/data --lgs 'zh-en' --ae_steps 'zh,en' --word_dropout 0.1 --word_blank 0.1 --word_shuffle 3 --lambda_ae '0:1,100000:0.1,300000:0' --encoder_only false --emb_dim 512 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --gelu_activation true --batch_size 32 --bptt 256 --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 --epoch_size 200000 --stopping_criterion 'valid_zh-en_mt_bleu,10' --validation_metrics 'valid_zh-en_mt_bleu'

when adding the parameter --eval_bleu true, an AssertionError is shown as following:
Traceback (most recent call last): File "train.py", line 323, in <module> check_data_params(params) File "XLM/src/data/loader.py", line 320, in check_data_params assert params.eval_bleu is False or len(params.mt_steps + params.bt_steps) > 0 AssertionError
but when not adding the parameter --eval_bleu true, it ran successfully, until the end of epoch 0, it gave another AssertionError:
WARNING - 09/16/19 15:22:29 - 0:11:28 - Metric "valid_zh-en_mt_bleu" not found in scores!
Traceback (most recent call last):
File "train.py", line 327, in
main(params)
File "train.py", line 306, in main
trainer.end_epoch(scores)
File "XLM/src/trainer.py", line 598, in end_epoch
assert metric in scores, metric
AssertionError: valid_zh-en_mt_bleu

So what's the problem? How to ran successfully with the parameter --eval_bleu true, and don't show the AssertionError Metric "valid_zh-en_mt_bleu" not found in scores! Thank you very much!

The text was updated successfully, but these errors were encountered:

glample · 2019-09-16T09:57:18Z

The problem is that you are not training your model to do MT here, only auto-encoding with --ae_steps 'zh,en'. You need to add --mt_steps 'zh-en,zh-en' to do MT if you have parallel datasets, and --bt_steps 'zh-en-zh,en-zh-en' for the back-translation.

Please check https://github.com/facebookresearch/XLM#train-on-unsupervised-mt-from-a-pretrained-model

AssertionError: valid_zh-en_mt_bleu means that you told the script to save the best model based on the valid_zh-en_mt_bleu metric: --validation_metrics 'valid_zh-en_mt_bleu'. But if you don't evaluate BLEU --eval_bleu false, this metric will not exist at the end of each epoch, and the model cannot use it so it will raise an error.

lihongzheng-nlp · 2019-09-17T02:01:38Z

@glample Thanks for your help! after adding both --mt_steps 'zh-en,zh-en' and --bt_steps 'zh-en-zh,en-zh-en', it raised the following error:
File "XLM/src/data/loader.py", line 267, in check_data_params
assert len(params.mt_steps) == len(set(params.mt_steps))
AssertionError

When replacing them with --mt_steps 'zh-en' --bt_steps 'zh-en-zh' or --mt_steps 'zh-en', it raised following error:
INFO - 09/17/19 09:52:56 - 0:00:02 - ============ Parallel data (en-zh)
Traceback (most recent call last):
File "train.py", line 327, in
main(params)
File "train.py", line 230, in main
data = load_data(params)
File "/XLM/src/data/loader.py", line 343, in load_data
load_para_data(params, data)
File "/XLM/src/data/loader.py", line 192, in load_para_data
src_data = load_binarized(src_path, params)
File "/XLM/src/data/loader.py", line 66, in load_binarized
assert os.path.isfile(path), path
AssertionError: data/zh-en-0814/fast_bpe/train.en-zh.en.pth

But I don't have the file train.en-zh.en.pth under data directory. I processed the data following the official instruction, the processed data includes follow files:
├── codes
├── test.en.pth -> test.zh-en.en.pth
├── test.zh-en.en
├── test.zh-en.en.pth
├── test.zh-en.zh
├── test.zh-en.zh.pth
├── test.zh.pth -> test.zh-en.zh.pth
├── train.en
├── train.en.pth
├── train.zh
├── train.zh.pth
├── valid.en.pth -> val.zh-en.en.pth
├── valid.zh-en.en
├── valid.zh-en.zh
├── valid.zh.pth -> val.zh-en.zh.pth
├── val.zh-en.en.pth
├── val.zh-en.zh.pth
├── vocab.en
├── vocab.zh
└── vocab.zh-en

I was wondering whether the parameter --bt_steps is always necessary during training? Would you please give me some further detailed guide on above errors to train the model successfully? I'm just a little confused with so many parameters. Thank you!

aconneau · 2019-09-20T21:57:08Z

--mt_steps 'zh-en,zh-en' -> replace with --mt_steps 'zh-en,en-zh' for both directions. Back-translation (bt_steps) is an essential part of the UnsupMT algorithm so yes it is necessary. Please refer to the related papers for its impact on the quality of UnsupMT.

When you say that you downloaded the data following the official instruction, I think you're mistaking the preprocessing of XNLI versus the preprocessing for UnsupervisedMT which are in two different chapters of the README. We didn't provide en-zh preprocessing scripts for UnsupervisedMT. If you're doing it, please consider adapting the files in your path to the files you're supposed to have as if you were doing "./get-data-nmt.sh --src en --tgt ro" for example. If you don't have parallel datasets and are doing UnsupMT, remove the "mt_steps".

lihongzheng-nlp · 2019-09-24T07:33:24Z

@aconneau Thankf for your detailed infomation. I'm sure I downloaded the data for UnsupervisedMT with ./get-data-nmt.sh --src en --tgt fr --reload_codes codes_enfr --reload_vocab vocab_enfr. Maybe I need to go through the whole pipeline from scratch.
By the way, I have been keeping reading your excellent works. Thank you!

JxuHenry · 2019-11-01T05:03:05Z

@VictorLi2017 Sir，Have you trained the zh-en model ? What is the BLEU value?

conquerSelf · 2020-05-17T08:30:55Z

--mt_steps 'zh-en,zh-en' -> replace with --mt_steps 'zh-en,en-zh' for both directions. Back-translation (bt_steps) is an essential part of the UnsupMT algorithm so yes it is necessary. Please refer to the related papers for its impact on the quality of UnsupMT.

When you say that you downloaded the data following the official instruction, I think you're mistaking the preprocessing of XNLI versus the preprocessing for UnsupervisedMT which are in two different chapters of the README. We didn't provide en-zh preprocessing scripts for UnsupervisedMT. If you're doing it, please consider adapting the files in your path to the files you're supposed to have as if you were doing "./get-data-nmt.sh --src en --tgt ro" for example. If you don't have parallel datasets and are doing UnsupMT, remove the "mt_steps".

hello!I follow your step to train unsupervisied MT after finishing XLM pre-training,but i meet the following AssertionError:
Traceback (most recent call last):
File "train.py", line 337, in
main(params)
File "train.py", line 290, in main
trainer.mt_step(lang, lang, params.lambda_ae)
File "/data/zsj/cgj/xlm_umt/XLM/src/trainer.py", line 851, in mt_step
enc1 = self.encoder('fwd', x=x1, lengths=len1, langs=langs1, causal=False)
File "/home/zsj/anaconda3/envs/cgj1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zsj/anaconda3/envs/cgj1/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/zsj/anaconda3/envs/cgj1/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/zsj/anaconda3/envs/cgj1/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/zsj/anaconda3/envs/cgj1/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
AssertionError: Caught AssertionError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/zsj/anaconda3/envs/cgj1/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/zsj/anaconda3/envs/cgj1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/data/zsj/cgj/xlm_umt/XLM/src/model/transformer.py", line 326, in forward
return self.fwd(**kwargs)
File "/data/zsj/cgj/xlm_umt/XLM/src/model/transformer.py", line 346, in fwd
assert lengths.size(0) == bs
AssertionError
I try some ways to solve it but fail.Can you help me to solve the problem?Thank you so much!

hcd7434 · 2020-11-25T02:19:54Z

Following the instruction, I trained unsupervised MT as follow:
python train.py --exp_name unsupMT_zh-en --dump_path ./dumped/ --reload_model 'best-valid_mlm_ppl.pth,best-valid_mlm_ppl.pth' --data_path path/to/data --lgs 'zh-en' --ae_steps 'zh,en' --word_dropout 0.1 --word_blank 0.1 --word_shuffle 3 --lambda_ae '0:1,100000:0.1,300000:0' --encoder_only false --emb_dim 512 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --gelu_activation true --batch_size 32 --bptt 256 --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 --epoch_size 200000 --stopping_criterion 'valid_zh-en_mt_bleu,10' --validation_metrics 'valid_zh-en_mt_bleu'

when adding the parameter --eval_bleu true, an AssertionError is shown as following:
Traceback (most recent call last): File "train.py", line 323, in <module> check_data_params(params) File "XLM/src/data/loader.py", line 320, in check_data_params assert params.eval_bleu is False or len(params.mt_steps + params.bt_steps) > 0 AssertionError
but when not adding the parameter --eval_bleu true, it ran successfully, until the end of epoch 0, it gave another AssertionError:
WARNING - 09/16/19 15:22:29 - 0:11:28 - Metric "valid_zh-en_mt_bleu" not found in scores!
Traceback (most recent call last):
File "train.py", line 327, in
main(params)
File "train.py", line 306, in main
trainer.end_epoch(scores)
File "XLM/src/trainer.py", line 598, in end_epoch
assert metric in scores, metric
AssertionError: valid_zh-en_mt_bleu

So what's the problem? How to ran successfully with the parameter --eval_bleu true, and don't show the AssertionError Metric "valid_zh-en_mt_bleu" not found in scores! Thank you very much!

What is the ppl of MLM when you train the zh-en pre-training model, and is the BLEU obtained as NMT or UNMT?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AssertionError in training unsupervised MT #201

AssertionError in training unsupervised MT #201

lihongzheng-nlp commented Sep 16, 2019

glample commented Sep 16, 2019

lihongzheng-nlp commented Sep 17, 2019 •

edited

Loading

aconneau commented Sep 20, 2019 •

edited

Loading

lihongzheng-nlp commented Sep 24, 2019

JxuHenry commented Nov 1, 2019 •

edited

Loading

conquerSelf commented May 17, 2020

hcd7434 commented Nov 25, 2020

AssertionError in training unsupervised MT #201

AssertionError in training unsupervised MT #201

Comments

lihongzheng-nlp commented Sep 16, 2019

glample commented Sep 16, 2019

lihongzheng-nlp commented Sep 17, 2019 • edited Loading

aconneau commented Sep 20, 2019 • edited Loading

lihongzheng-nlp commented Sep 24, 2019

JxuHenry commented Nov 1, 2019 • edited Loading

conquerSelf commented May 17, 2020

hcd7434 commented Nov 25, 2020

lihongzheng-nlp commented Sep 17, 2019 •

edited

Loading

aconneau commented Sep 20, 2019 •

edited

Loading

JxuHenry commented Nov 1, 2019 •

edited

Loading