the objective is normal? #8

ZhenYangIACAS · 2017-11-07T10:54:00Z

I ran the code on my dataset, and the objective I got is 32.5354% after 67 iteration. It is normal? How should I finetune the parameters?

artetxem · 2017-11-07T11:50:07Z

That depends entirely on your dataset. It seems a bit low compared to what I usually get, but it could be reasonable in your case. The only way to know it is to somehow evaluate your embeddings, although manually checking the nearest neighbors of a few words is enough to check that the system is learning something.

The mapping method itself does not have any hyperparameter, so there is nothing to explore there. However, you may want to tune the hyperparameters of the embeddings themselves, try different normalization options, or play with the training corpus and dictionary, which could all make a considerable difference.

ZhenYangIACAS · 2017-11-07T11:56:18Z

I manually build a dictionary containing several word pairs for the translation test. The coverage is 100% and the accuracy is 0. Why the accuracy is 0.

artetxem · 2017-11-07T12:03:14Z

I obviously don't know if you don't give more details. What was your training set (language pair, corpus, embeddings, dictionary...)? What commands did you run to learn the mapping and evaluate it?

ZhenYangIACAS · 2017-11-07T12:08:03Z

language pair is English to Chinese, corpus contains 200w sentences. dictionary only contains five word pairs. I run with the command "python3 eval_translation.py train.en.txt.remBlank.tok.bpe.lf.50.mono.vectors.normalized.mapped train.zh.seg.txt.remBlank.bpe.lf.50.mono.vectors.normalized.mapped -d test_dic"

ZhenYangIACAS · 2017-11-07T12:08:53Z

the test_dict is:
word 词语
I 我
you 他
hello 你好
hi 你好
thanks 谢谢
word 词
I 我们
And the mapped embedding is got according to the example in README

artetxem · 2017-11-07T13:46:04Z

So the embeddings were trained in only 200 sentences? That's way too little to get anything reasonable. The training dictionary of only 5 word pairs seems too small as well. In our paper we report positive results starting at 25 word pairs.

ZhenYangIACAS · 2017-11-08T00:30:23Z

@artetxem No, the embeddings are trained in 200w(2000000) sentences. I have expanded the dictionary to 25 words, the accuracy is still 0. Maybe may test dictionary is still too small?

artetxem · 2017-11-09T10:09:06Z

Your test dictionary is indeed very small, and it might be that you also need a larger training dictionary for English-Chinese. I would also recommend you to try the numeral-based initialization, I would expect it to be more robust assuming that there are arabic numerals in the Chinese training corpus. Also, how did you train your embeddings? What is your vocabulary size?

ZhenYangIACAS · 2017-11-10T06:22:47Z

@ artetxem Yes, I am utilizing the nemeral-based initialization and the vocabulary size for our model is 30000. I will test it with a bigger test dictionary. Thank you .

liujiqiang999 · 2018-09-23T04:15:35Z

@ZhenYangIACAS Hi, Have you solved the problem?

IT-coach-666 · 2020-05-23T06:23:45Z

@ZhenYangIACAS @JiqiangLiu 运行命令行示例（无监督训练 en2zh 时, 传递命令行参数 --unsupervised_vocab 8000 才能得到比较好的效果）：
python map_embeddings.py --unsupervised --unsupervised_vocab 8000 ./jy_data/model_en.vec ./jy_data/model_zh_j.vec ./jy_data/model_en_mapped2.vec ./jy_data/model_zh_j_mapped2.vec --cuda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the objective is normal? #8

the objective is normal? #8

ZhenYangIACAS commented Nov 7, 2017

artetxem commented Nov 7, 2017

ZhenYangIACAS commented Nov 7, 2017

artetxem commented Nov 7, 2017

ZhenYangIACAS commented Nov 7, 2017

ZhenYangIACAS commented Nov 7, 2017 •

edited

Loading

artetxem commented Nov 7, 2017

ZhenYangIACAS commented Nov 8, 2017

artetxem commented Nov 9, 2017

ZhenYangIACAS commented Nov 10, 2017

liujiqiang999 commented Sep 23, 2018

IT-coach-666 commented May 23, 2020

the objective is normal? #8

the objective is normal? #8

Comments

ZhenYangIACAS commented Nov 7, 2017

artetxem commented Nov 7, 2017

ZhenYangIACAS commented Nov 7, 2017

artetxem commented Nov 7, 2017

ZhenYangIACAS commented Nov 7, 2017

ZhenYangIACAS commented Nov 7, 2017 • edited Loading

artetxem commented Nov 7, 2017

ZhenYangIACAS commented Nov 8, 2017

artetxem commented Nov 9, 2017

ZhenYangIACAS commented Nov 10, 2017

liujiqiang999 commented Sep 23, 2018

IT-coach-666 commented May 23, 2020

ZhenYangIACAS commented Nov 7, 2017 •

edited

Loading