The code and dataset of paper Multi-modal Contrastive Representation Learning for Entity Alignment [arxiv] [acl] in Proceedings of COLING 2022 (oral).
The multi-modal version of DBP15K dataset comes from the EVA repository, and the folder pkls
of DBP15K image features should be downloaded according to the guidance of EVA repository, and the downloaded folder pkls
is placed in the data
directory of this repository.
The word embedding we used is glove-6B
, you can download it from glove, and unzip it into the data/embedding
directory.
The original cross-KG datasets (FB15K-DB15K/YAGO15K) comes from MMKB, in which the image embeddings are extracted from the pre-trained VGG16. We use the image embeddings provided by MMKB and transform the data into the format consistent with DBP15K. The converted dataset can be downloaded from BaiduDisk (the password is stdt
), and placed them in the data
directory.
Here is the example of training MLCEA on DBP15K
.
bash run_dbp15k.sh 0 42 zh_en
bash run_dbp15k.sh 0 42 ja_en
bash run_dbp15k.sh 0 42 fr_en
Here is the example of training MCLEA on FB15K_DB15K
with different ratio seeds. Similarly, you can replace the parameter FB15K_DB15K
with FB15K_YAGO15K
to train FB15K-YAGO15K dataset.
bash run_mmkb.sh 0 42 FB15K_DB15K 0.2
bash run_mmkb.sh 0 42 FB15K_DB15K 0.5
bash run_mmkb.sh 0 42 FB15K_DB15K 0.8
If you use this model or code, please cite it as follows:
@inproceedings{lin2022multi,
title = {Multi-modal Contrastive Representation Learning for Entity Alignment},
author = {Lin, Zhenxi and Zhang, Ziheng and Wang, Meng and Shi, Yinghui and Wu, Xian and Zheng, Yefeng},
booktitle = {Proceedings of the 29th International Conference on Computational Linguistics},
url = {https://aclanthology.org/2022.coling-1.227},
year = {2022},
pages = {2572--2584},
}
Our codes are modified based on EVA, and we would like to thank their open-sourced work.