Here, we are concentrate on collection of research papers relate to information extraction for multi-modal data.
- Review on Multi-modal Data Analytics
- Multi-modal Dataset
- Multi-modal Named Entity Recognition
- Multi-modal Relation Extraction
- Multi-modal Event Extraction
- Multi-modal Representation Learning
- Multi-modal Entity Alignment
- Multi-modal Entity Linking
- Multi-modal Grounding
- Multi-modal KG Construction
- Joint Understanding for Text and Image
- Multi-modal Knowledge Graphs for Recommender Systems
- Tutorials
-
Yang Wang. Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion. Arxiv 2020. [Paper]
-
Aditya Mogadala, Marimuthu Kalimuthu, and Dietrich Klakow. Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods. Arxiv 2019. [Paper]
-
Daheng Wang, Tong Zhao, Wenhao Yu, Nitesh V. Chawla, and Meng Jiang. Deep Multimodal Complementarity Learning. TNNLS 2022. [Paper]
-
Zheng C, Wu Z, Feng J, et al. Mnre: A challenge multimodal dataset for neural relation extraction with visual evidence in social media posts[C]. 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2021: 1-6. [Paper]
-
Guozheng Li, Peng Wang, Jiafeng Xie, et al. FEED:A Chinese Financial Event Extraction Dataset Constructed by Distant Supervisions[C]. IJCKG 2021. [Paper]
-
Liu X, Gao F, Zhang Q, et al. Graph convolution for multimodal information extraction from visually rich documents[J]. arXiv preprint arXiv:1903.11279, 2019. [Paper]
-
Shih-Fu Chang, LP Morency, Alexander Hauptmann, Alberto Del Bimbo, Cathal Gurrin, Hayley Hung, Heng Ji, and Alan Smeaton. Panel: Challenges for Multimedia/Multimodal Research in the Next Decade. ACMMM 2019. [Paper]
-
Manling Li, Ying Lin, Ananya Subburathinam, et al. GAIA at SM-KBP 2019 - A Multi-media Multi-lingual Knowledge Extraction and Hypothesis Generation System. TACL 2019. [Paper]
-
Manling Li, Alireza Zareian, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare Voss, Daniel Napierski, and Marjorie Freedman. GAIA: A Fine-grained Multimedia Knowledge Extraction System. ACL 2020. [Paper] (Best Demo Paper)
-
Tong Xu, Peilun Zhou, and Enhong Chen, Uncertainty in Multimodal Semantic Understanding. Uncertainty in Multimodal Semantic Understanding. In Communication of China Association of Artificial Intelligence (in Chinese) 2020. [Paper]
-
Carl Yang, Jieyu Zhang, Haonan Wang, Sha Li, Yu Shi, Myunghwan Kim, Matt Walker, and Jiawei Han. Relation Learning on Social Networks with Multi-Modal Graph Edge Variational Autoencoders[C]. WSDM 2020. [Paper] [Code]
-
Tong Xu*, Peilun Zhou*, Linkang Hu, Xiangnan He, Yao Hu, and Enhong Chen. Socializing the Videos: A Multimodal Approach for Social Relation Recognition, In ACM Transactions on Multimedia Computing Communications and Applications 2021. [Paper]
-
Wan H, Zhang M, Du J, et al. FL-MSRE: A few-shot learning based approach to multimodal social relation extraction[C]. AAAI 2021, 35(15): 13916-13923. [Paper]
-
Pingali S, Yadav S, Dutta P, et al. Multimodal graph-based transformer framework for biomedical relation extraction[J]. ACL finds 2021. [Paper]
-
Zheng C, Feng J, Fu Z, et al. Multimodal relation extraction with efficient graph alignment[C]. ACMM. 2021: 5298-5306. [Paper]
-
Chen X, Zhang N, Li L, et al. Good visual guidance makes a better extractor: Hierarchical visual prefix for multimodal entity and relation extraction[J]. NAACL 2022. [Paper]
-
Xu B, Huang S, Du M, et al. Different data, different modalities! reinforced data splitting for effective multimodal information extraction from social media posts[C]. COLING 2022: 1855-1864. [Paper]
-
Revanth Gangi Reddy†, Xilin Rui†, Manling Li, Xudong Lin, Haoyang Wen, Jaemin Cho, Lifu Huang, Mohit Bansal, Avi Sil, Shih-Fu Chang, Alexander Schwing, and Heng Ji. MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding[C]. AAAI 2022. [Paper] [Data]
-
Wu S, Fei H, Cao Y, et al. Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling[C]. ACL 2023. [Paper]
-
Zheng C, Feng J, Cai Y, et al. Rethinking Multimodal Entity and Relation Extraction from a Translation Point of View[C]. ACL (Volume 1: Long Papers). 2023: 6810-6824. [Paper]
-
Hu X, Guo Z, Teng Z, et al. Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis[C]. ACL 2023. [Paper]
-
Shengqiong Wu, Hao Fei, Yixin Cao, Lidong Bing, and Tat-Seng Chua. Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling[C]. ACL 2023. [Paper] [Report]
-
Xiao Wang, Weikang Zhou, Can Zu, Han Xia, Tianze Chen, Yuansen Zhang, Rui Zheng, Junjie Ye, Qi Zhang, Tao Gui, Jihua Kang, Jingsheng Yang, Siyuan Li, and Chunsai Du. InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction[C]. Arxiv 2023. [Paper] [Report]
Relation Extraction in 2018/2019
-
Li M, Zareian A, Zeng Q, et al. Cross-media structured common space for multimedia event extraction[J]. ACL 2020. [Paper]
-
Tong M, Wang S, Cao Y, et al. Image enhanced event detection in news articles[C]. AAAI 2020, 34(05): 9040-9047. [Paper]
-
Manling Li*, Alireza Zareian*, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare R. Voss, Dan Napierski, and Marjorie Freedman. GAIA: A Fine-grained Multimedia Knowledge Extraction System[C]. ACL 2020. [Demo] [[http://blender.cs.illinois.edu/software/gaia-ie/]] [Video]
-
Wan H, Zhang M, Du J, et al. FL-MSRE: A few-shot learning based approach to multimodal social relation extraction[C]. AAAI 2021, 35(15): 13916-13923. [Paper]
-
Zhang L, Zhou D, He Y, et al. MERL: Multimodal event representation learning in heterogeneous embedding spaces[C]. AAAI 2021, 35(16): 14420-14427. [Paper]
-
Brian Chen, Xudong Lin, Christopher Thomas, Manling Li, Shoya Yoshida, Lovish Chum, Heng Ji, and Shih-Fu Chang. Joint Multimedia Event Extraction from Video and Article[C]. EMNLP 2021 Findings.
-
Li M, Xu R, Wang S, et al. Clip-event: Connecting text and images with event structures[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 16420-16429. [Paper]
-
Jian Liu, Yufeng Chen, and Jinan Xu. Multimedia Event Extraction From News With a Unified Contrastive Learning Framework[C]. ACMM. [Paper]
-
Huapeng Xu, Guilin Qi, Jingjing Li, Meng Wang, Kang Xu, and Huan Gao. Fine-grained Image Classification by Visual-Semantic Embedding. IJCAI 2018. [Paper]
-
Pouya Pezeshkpour, Liyan Chen, and Sameer Singh. Embedding Multimodal Relational Data for Knowledge Base Completion. EMNLP 2018. [Paper] [Comprehension].
-
Hatem Mousselly-Sergieh, Teresa Botschen, Iryna Gurevych, and Stefan Roth. A Multimodal Translation-Based Approach for Knowledge Graph Representation Learning. *SEM 2018. [Paper].
-
Ruobing Xie, Zhiyuan Liu, Huanbo Luan, Maosong Sun. Image-embodied Knowledge Representation Learning. IJCAI 2017. [Paper].
-
Pouya Pezeshkpour, Liyan Chen, and Sameer Singh. Embedding Multimodal Relational Data. NIPS 2017. [Paper].
-
Derong Xu, Tong Xu*, Shiwei Wu, Jingbo Zhou, and Enhong Chen. Relation-enhanced Negative Sampling for Multimodal Knowledge Graph Completion. ACM MM 2022. [Paper].
KDD2020 Tutorial: Multi-modal Network Representation Learning
- Qian Li, Shu Guo, Yangyifei Luo, Cheng Ji, Lihong Wang, Jiawei Sheng, and Jianxin Li. Attribute-Consistent Knowledge Graph Representation Learning for Multi-Modal Entity Alignment. WWW 2023. [Paper] [Report]
- Pengfei Luo, Tong Xu*, Shiwei Wu, Chen Zhu, Linli Xu, and Enhong Chen. Multi-Grained Multimodal Interaction Network for Entity Linking. KDD 2023. [Paper]
多模态实体链接(Multimodal Entity Linking)论文整理(更新至2023.6.27)
- Yu Zhou, Sha Li, Manling Li, Xudong Lin, Shih-Fu Chang, Mohit Bansal, and Heng Ji. Non-Sequential Graph Script Induction via Multimedia Grounding. ACL 2023. [Paper] [Code]
-
Meng Wang, Guilin Qi, Haofen Wang, and Qiushuo Zheng. Richpedia: A Comprehensive Multi-modal Knowledge Graph. JIST 2019. [Paper]
-
Ye Liu, Hui Li, Alberto Garcia-Duran, Mathias Niepert, Daniel Onoro-Rubio, and David S. Rosenblum. MMKG: Multi-Modal Knowledge Graphs. ESWC 2019. [Paper]
-
Hongzhi Li, Joe Ellis, Heng Ji, and Shih-Fu Chang. Event Specific Multimodal Pattern Mining for Knowledge Base Construction. CSME 2018. [Paper]
-
Sebasti´an Ferrada, Benjamin Bustos, and Aidan Hogan. IMGpedia: A Linked Dataset with Content-Based Analysis of Wikimedia Images. ISWC 2017. [Paper]
-
Sebasti´an Ferrada, Benjamin Bustos, and Aidan Hogan. Multimodal Biological Knowledge Graph Completion via Triple Co-attention Mechanism. ICDE 2023. [Paper]
多模态知识图谱 (来自知乎, 漆桂林教授, 东南大学)
Multimodal Knowledge Graphs: Construction, Inference, and Challenges (NLPCC 2020) | 多模态知识图谱构建和推理技术
同济大学王昊奋:知识图谱在多模态大数据时代的创新和实践 | 世界人工智能大会达观数据论坛 (来自达观数据, 王昊奋, 同济大学)
【积微成著】专题分享——多模态知识图谱构建、推理和挑战 (来自PlantData知识图谱实战)
Multi-modal Knowledge Graph (来自Github)
-
Rui Sun, Xuezhi Cao, Yan Zhao, Junchen Wan, Kun Zhou, Fuzheng Zhang, Zhongyuan Wang, and Kai Zheng. Multi-modal Knowledge Graphs for Recommender Systems. CIKM 2020. [Paper]
-
Chuhan Wu, Fangzhao Wu, Tao Qi, Chao Zhang, Yongfeng Huang, and Tong Xu. MM-Rec: Visiolinguistic Model Empowered Multimodal News Recommendation. SIGIR 2022. [Paper]
-
Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web. [ACL 2020]
-
Manning、 Ostendorf、 Povey、 何晓冬、 周明共话多模态NLP的机遇和挑战(附视频). [2020 北京智源大会 圆桌论坛 AI新疆域:多模态自然语言处理前沿趋势]