Data and code for Understanding Linearity of Cross-Lingual Word Embedding Mappings (TMLR 2022)
Please find the cross-lingual word analogy corpus (xANLG) in the /data
folder.
get_emb.py
: Retrieve vectors corresponding to lexicons of xANLG from pre-trained word embeddings, then perform pre-processing steps. We process one language pair per time.LRCos
: Please directly use the Vecto library.validate_analogy.py
: Perform the parallelogram validation algorithm introduced in §4.1.3.linear_map.py
: Find the linear mapping using Generic Procrustes Analysis.
If you like our project or find it useful, please give us a ⭐ and cite us
@article{xANLG,
title={Understanding Linearity of Cross-Lingual Word Embedding Mappings},
author={Xutan Peng and Mark Stevenson and Chenghua Lin and Chen Li},
journal={Transactions on Machine Learning Research},
year={2022},
url={https://openreview.net/forum?id=8HuyXvbvqX}
}