xANLG

Data and code for Understanding Linearity of Cross-Lingual Word Embedding Mappings (TMLR 2022)

Data

Please find the cross-lingual word analogy corpus (xANLG) in the /data folder.

Code

get_emb.py: Retrieve vectors corresponding to lexicons of xANLG from pre-trained word embeddings, then perform pre-processing steps. We process one language pair per time.
LRCos: Please directly use the Vecto library.
validate_analogy.py: Perform the parallelogram validation algorithm introduced in §4.1.3.
linear_map.py: Find the linear mapping using Generic Procrustes Analysis.

About

If you like our project or find it useful, please give us a ⭐ and cite us

@article{xANLG,
title={Understanding Linearity of Cross-Lingual Word Embedding Mappings},
author={Xutan Peng and Mark Stevenson and Chenghua Lin and Chen Li},
journal={Transactions on Machine Learning Research},
year={2022},
url={https://openreview.net/forum?id=8HuyXvbvqX}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
LICENSE		LICENSE
README.md		README.md
get_emb.py		get_emb.py
linear_map.py		linear_map.py
validate_analogy.py		validate_analogy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xANLG

Data

Code

About

About

Languages

License

Pzoom522/xANLG

Folders and files

Latest commit

History

Repository files navigation

xANLG

Data

Code

About

About

Topics

Resources

License

Stars

Watchers

Forks

Languages