This is a reading list for Bilingual Lexicon Induction (BLI), also known as Word Translation, Bilingual Lexicon Extraction, Bilingual Dictionary Induction, and so forth. A large body of BLI work relies on calculating Cross-Lingual Word Embeddings (CLWEs) for word retrieval; some other BLI approaches learn a pairwise classifier for BLI; recent work prompts large language models for BLI and achieves new state-of-the-art BLI performance. The list mainly includes 2018-2024 publications. Frequently updated. Pull requests and discussions are welcome!
Exploiting Similarities among Languages for Machine Translation (arXiv 2013)
Tomas Mikolov, Quoc V. Le, Ilya Sutskever
[Paper]
Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation (NAACL 2015)
Chao Xing, Dong Wang, Chao Liu, Yiye Lin
[Paper]
Comments: Beginners could also refer to Procrustes on Wikipedia and our sample code ./SampleCode.py.
Word Translation Without Parallel Data (ICLR 2018)
Guillaume Lample, Alexis Conneau, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou
[Paper]
[Code]
Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion (EMNLP 2018)
Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Hervé Jégou, Edouard Grave
[Paper]
[Code]
A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual Mappings of Word Embeddings (ACL 2018)
Mikel Artetxe, Gorka Labaka, Eneko Agirre
[Paper]
[Code]
Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations (AAAI 2018)
Mikel Artetxe, Gorka Labaka, Eneko Agirre
[Paper]
[Code]
Comments: VecMap supports unsupervised (its ACL 2018 paper), semi-supervised and supervised (its AAAI 2018 paper) BLI settings.
Improving Word Translation via Two-Stage Contrastive Learning (ACL 2022)
Yaoyiran Li, Fangyu Liu, Nigel Collier, Anna Korhonen, Ivan Vulić
[Paper]
[Code]
Comments: New (2022) state-of-the-art method for semi-supervised and supervised BLI!
On Bilingual Lexicon Induction with Large Language Models (EMNLP 2023)
Yaoyiran Li, Anna Korhonen, Ivan Vulić
[Paper]
[Code]
Comments: Prompt multilingual LLMs for BLI. Achieves new (2023) state-of-the-art BLI performance on many language pairs! A simple demo is provided in our sample code ./SampleCode.py.
Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces (ACL 2019)
Barun Patra, Joel Ruben Antony Moniz, Sarthak Garg, Matthew R. Gormley, Graham Neubig
[Paper]
[Code]
Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach (TACL 2019)
Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, Bamdev Mishra
[Paper]
[Code]
LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space (EMNLP 2020)
Tasnim Mohiuddin, M Saiful Bari, Shafiq Joty
[Paper]
[Code]
Non-Linear Instance-Based Cross-Lingual Mapping for Non-Isomorphic Embedding Spaces (ACL 2020)
Goran Glavaš, Ivan Vulić
[Paper]
[Code]
Combining Static Word Embeddings and Contextual Representations for Bilingual Lexicon Induction (Findings of ACL 2021)
Jinpeng Zhang, Baijun Ji, Nini Xiao, Xiangyu Duan, Min Zhang, Yangbin Shi, Weihua Luo
[Paper]
[Code]
It’s not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT (BlackboxNLP Workshop 2020)
Hila Gonen, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg
[Paper]
[Code]
Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization (ACL 2019)
Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, Jordan Boyd-Graber
[Paper]
[Code]
Normalization of Language Embeddings for Cross-Lingual Alignment (ICLR 2022)
Prince Osei Aboaggye, Yan Zheng, Junpeng Wang, Michael Yeh, Wei Zhang, Liang Wang, Hao Yang, Jeff M. Phillips
[Paper]
[Code]
Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework (ICLR 2020)
Zirui Wang+, Jiateng Xie+, Ruochen Xu, Yiming Yang, Graham Neubig, Jaime Carbonell (+: equal contribution)
[Paper]
[Code]
Filtered Inner Product Projection for Crosslingual Embedding Alignment (ICLR 2021)
Vin Sachidananda, Ziyi Yang, Chenguang Zhu
[Paper]
[Code]
Classification-Based Self-Learning for Weakly Supervised Bilingual Lexicon Induction (ACL 2020)
Mladen Karan, Ivan Vulić, Anna Korhonen, Goran Glavaš
[Paper]
[Code]
Visual Grounding in Video for Unsupervised Word Translation (CVPR 2020)
Gunnar A. Sigurdsson, Jean-Baptiste Alayrac, Aida Nematzadeh, Lucas Smaira, Mateusz Malinowski, Joao Carreira, Phil Blunsom, Andrew Zisserman
[Paper]
[Code]
A Relaxed Matching Procedure for Unsupervised BLI (ACL 2020)
Xu Zhao, Zihao Wang, Yong Zhang, Hao Wu
[Paper]
[Code]
A Graph-based Coarse-to-fine Method for Unsupervised Bilingual Lexicon Induction (ACL 2020)
Shuo Ren, Shujie Liu, Ming Zhou, Shuai Ma
[Paper]
Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing (NAACL 2019)
Tal Schuster, Ori Ram, Regina Barzilay, Amir Globerson
[Paper]
[Code]
Multilingual Alignment of Contextual Word Representations (ICLR 2020)
Steven Cao, Nikita Kitaev, Dan Klein
[Paper]
Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment (ACL 2021)
Haoyue Shi, Luke Zettlemoyer, Sida I. Wang
[Paper]
[Code]
Bilingual Lexicon Induction through Unsupervised Machine Translation (ACL 2019)
Mikel Artetxe, Gorka Labaka, Eneko Agirre
[Paper]
[Code]
Unsupervised Alignment of Embeddings with Wasserstein Procrustes (AISTATS 2019)
Edouard Grave, Armand Joulin, Quentin Berthet
[Paper]
[Code]
Gromov-Wasserstein Alignment of Word Embedding Spaces (EMNLP 2018)
David Alvarez-Melis, Tommi Jaakkola
[Paper]
[Code]
Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation (NAACL 2021)
Xutan Peng, Chenghua Lin, Mark Stevenson
[Paper]
[Code]
A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction (COLING 2020)
Yanyang Li, Yingfeng Luo, Ye Lin, Quan Du, Huizhen Wang, Shujian Huang, Tong Xiao, Jingbo Zhu
[Paper]
Cross-Lingual BERT Contextual Embedding Space Mapping with Isotropic and Isometric Conditions (arXiv 2021)
Haoran Xu, Philipp Koehn
[Paper]
[Code]
Learning a Reversible Embedding Mapping using Bi-Directional Manifold Alignment (Findings of ACL 2021)
Ashwinkumar Ganesan, Francis Ferraro, Tim Oates
[Paper]
[Code]
Interactive Refinement of Cross-Lingual Word Embeddings (EMNLP 2020)
Michelle Yuan+, Mozhi Zhang+, Benjamin Van Durme, Leah Findlater, Jordan Boyd-Graber (+: equal contribution)
[Paper]
[Code]
Improving Bilingual Lexicon Induction with Cross-Encoder Reranking (Findings of EMNLP 2022)
Yaoyiran Li, Fangyu Liu, Ivan Vulić+, Anna Korhonen+ (+: equal contribution)
[Paper]
[Code]
IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces (EMNLP 2022)
Kelly Marchisio, Neha Verma, Kevin Duh, Philipp Koehn
[Paper]
[Code]
Dual Word Embedding for Robust Unsupervised Bilingual Lexicon Induction (TASLP 2023)
Hailong Cao, Liguo Li, Conghui Zhu, Muyun Yang, Tiejun Zhao
[Paper]
[Code]
CD-BLI: Confidence-Based Dual Refinement for Unsupervised Bilingual Lexicon Induction (NLPCC 2023)
Shenglong Yu, Wenya Guo, Ying Zhang, Xiaojie Yuan
[Paper]
RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction (EMNLP 2022)
Zhoujin Tian, Chaozhuo Li, Shuo Ren, Zhiqiang Zuo, Zengxuan Wen, Xinyue Hu, Xiao Han, Haizhen Huang, Denvy Deng, Qi Zhang, Xing Xie
[Paper]
[Code]
ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting (IJCNLP-AACL 2023)
Abdellah El Mekki, Muhammad Abdul-Mageed, ElMoatez Billah Nagoudi, Ismail Berrada, Ahmed Khoumsi
[Paper]
[Code]
A Structure-Aware Generative Adversarial Network for Bilingual Lexicon
Induction (Findings of EMNLP 2023)
Bocheng Han, Qian Tao, Lusi Li, Zhihao Xiong
[Paper]
[Code]
Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport (EMNLP 2022)
Kelly Marchisio, Ali Saad-Eldin, Kevin Duh, Carey Priebe, Philipp Koehn
[Paper]
[Code]
Hierarchical Mapping for Crosslingual Word Embedding Alignment (TACL 2020)
Ion Madrazo Azpiazu, Maria Soledad Pera
[Paper]
[Code]
Self-Augmented In-Context Learning for Unsupervised Word Translation (ACL 2024)
Yaoyiran Li, Anna Korhonen, Ivan Vulić
[Paper]
[Code]
How Lexical is Bilingual Lexicon Induction? (Findings of NAACL 2024)
Harsh Kohli, Helian Feng, Nicholas Dronen, Calvin McCarter, Sina Moeini, Ali Kebarighotbi
[Paper]
Enhancing Bilingual Lexicon Induction via Bi-directional Translation Pair Retrieving (AAAI 2024)
Qiuyu Ding, Hailong Cao, Tiejun Zhao
[Paper]
When Your Cousin Has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages (LREC-COLING 2024)
Niyati Bafna, Cristina España-Bonet, Josef van Genabith, Benoît Sagot, Rachel Bawden
[Paper]
LexGen: Domain-aware Multilingual Lexicon Generation (arXiv 2024)
Karthika NJ, Ayush Maheshwari, Atul Kumar Singh, Preethi Jyothi, Ganesh Ramakrishnan, Krishnakant Bhatt
[Paper]
DM-BLI: Dynamic Multiple Subspaces Alignment for Unsupervised Bilingual Lexicon Induction (ACL 2024)
Ling Hu, Yuemei Xu
[Paper]
[Code]
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions (ACL 2019)
Goran Glavaš, Robert Litschko, Sebastian Ruder, Ivan Vulić
[Paper]
[Code]
Do We Really Need Fully Unsupervised Cross-Lingual Embeddings? (EMNLP 2019)
Ivan Vulić, Goran Glavaš, Roi Reichart, Anna Korhonen
[Paper]
[Code]
On the Limitations of Unsupervised Bilingual Dictionary Induction (ACL 2018)
Anders Søgaard, Sebastian Ruder, Ivan Vulić
[Paper]
Are All Good Word Vector Spaces Isomorphic? (EMNLP 2020)
Ivan Vulić, Sebastian Ruder, Anders Søgaard
[Paper]
[Code]
A Survey of Cross-Lingual Word Embedding Models (JAIR 2019)
Sebastian Ruder, Ivan Vulić, Anders Søgaard
[Paper]
Should All Cross-Lingual Embeddings Speak English? (ACL 2020)
Antonios Anastasopoulos, Graham Neubig
[Paper]
[Code]
Understanding Linearity of Cross-Lingual Word Embedding Mappings (TMLR 2022)
Xutan Peng, Mark Stevenson, Chenghua Lin, Chen Li
[Paper]
[Code]
A Comprehensive Analysis of Bilingual Lexicon Induction (CL 2017)
Ann Irvine, Chris Callison-Burch
[Paper]
Dictionary-based phrase-level prompting of large language models for machine translation (arXiv 2023)
Marjan Ghazvininejad, Hila Gonen, Luke Zettlemoyer
[Paper]
Improving Zero-Shot Cross-lingual Transfer for Multilingual Question Answering over Knowledge Graph (NAACL 2021)
Yucheng Zhou, Xiubo Geng, Tao Shen, Wenqiang Zhang, Daxin Jiang
[Paper]
Cross-Cultural Similarity Features for Cross-Lingual Transfer Learning of Pragmatically Motivated Tasks (EACL 2021)
Jimin Sun, Hwijeen Ahn, Chan Young Park, Yulia Tsvetkov, David R. Mortensen
[Paper]
Unsupervised Neural Machine Translation (ICLR 2018)
Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho
[Paper]
When Does Unsupervised Machine Translation Work? (WMT 2020)
Kelly Marchisio, Kevin Duh, Philipp Koehn
[Paper]
Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation (NAACL 2021)
Alexandra Chronopoulou, Dario Stojanovski, Alexander Fraser
[Paper]
CJE-TIG: Zero-shot cross-lingual text-to-image generation by Corpora-based Joint Encoding (Knowledge-Based Systems 2022)
Han Zhang, Suyi Yang, Hongqing Zhu
[Paper]
Comments: