This codes shows how to use the multilingual sentence embedding for cross-lingual NLI, using the XNLI corpus.
We train a NLI classifier on the English MultiNLI corpus, optimizing the meta-parameters on the English XNLI development corpus. We then apply that classifier to the test set for all 14 transfer languages. The foreign languages development set is not used.
Just run bash ./xnli.sh
which install XNLI and MultiNLI corpora,
calculates the multilingual sentence embeddings,
trains the classifier and displays results.
The XNLI corpus is available here.
You should get the following results for zero-short cross-lingual transfer. They slightly differ from those published in the initial version of the paper [2] due to the change to PyTorch 1.0 and variations in random number generation, new optimization of meta-parameters, etc.
en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
74.65 | 72.26 | 73.15 | 72.48 | 72.73 | 73.35 | 71.08 | 69.84 | 70.48 | 71.94 | 69.20 | 71.38 | 65.95 | 62.14 | 61.82 |
All numbers are accuracies on the test set
Details on the corpus are described in this paper:
[1] Alexis Conneau, Guillaume Lample, Ruty Rinott, Adina Williams, Samuel R. Bowman, Holger Schwenk and Veselin Stoyanov, XNLI: Cross-lingual Sentence Understanding through Inference, EMNLP, 2018.
Detailed system description:
[2] Mikel Artetxe and Holger Schwenk, Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond, arXiv, Dec 26 2018.