Skip to content

LiJiaBei-7/Awesome-Cross-Lingual-Cross-Modal-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 

Repository files navigation

Awesome Cross-/Multi-Lingual Cross-Modal Retrieval

Table of Contents

Datasets

Image-Text

  1. [ACL-16] Multi30K(multi-lingual version of Filickr30K)-[English|German|French|Czech]: Multi30K: Multilingual English-German Image Descriptions. [paper] [dataset]
  2. MSCOCO-[English|Chinese|Japanese]:
  • (English) [ARXIV-15] Microsoft COCO Captions: Data Collection and Evaluation Server. [paper] [dataset]
  • (Chinese) [TMM-19] COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval. [paper] [dataset]
  • (Japanese) [ACL-17] STAIR Captions:Constructing a Large-Scale Japanese Image Caption Dataset. [paper] [dataset]
  1. CC3M(mutli-lingual version) [dataset]
  2. Wukong(Chinese) [dataset]

Video-Text

  1. [ICCV-19] VATEX-[English|Chinese]: VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research. [Paper] [dataset]
  2. [ACM MM-22] MSRVTT-CN(multi-lingual version of MSRVTT)-[English|Chinese]: Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning. [paper] [dataset]

Note: This repository provides English captions and other language(Machine-translation version) captions of Multi30K, MSCOCO, VATEX, and MSRVTT-CN.



Papers and Code

2024

  • [Wang et al. TIP] Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval. [paper]
  • [Wang et al. AAAI] CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer. [paper]
  • [Wang et al. ACM MM] Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval.
  • [Cai et al. TKDE] Cross-Lingual Cross-Modal Retrieval with Noise-Robust Fine-Tuning. [paper]

2023

  • [Zeng et al. ACL] Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training. [paper] [code]
  • [Li et al. ACL] Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-training. [paper]
  • [Rouditchenko et al. ICASSP] C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval. [paper] [code]

2022

  • [Wang et al. ACM MM] Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning. [paper] [code]

2021

  • [Zhou et al. CVPR21] UC2:Universal Cross-lingual Cross-modal Vision-and-Language Pre-training. [paper] [code]
  • [Ni et al. CVPR21] M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training. [paper] [code]
  • [Huang et al. NAACL21] Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models. [paper]
  • [Fei et al. NAACL21] Cross-lingual Cross-modal Pretraining for Multimodal Retrieval. [paper]

2020

  • [Aggarwal et al. ARXIV] Towards Zero-shot Cross-lingual Image Retrieval. [paper]

2019

  • [Portaz et al. ARXIV] Image search using multilingual texts: a cross-modal learning approach between image and text. [paper]

Chinese Cross-modal Pre-training

  • [Gu et al. NIPS22] Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark. [paper] [code]
  • [Xie et al. ARXIV22] ZERO and R2D2: A Large-scale Chinese Cross-modal Benchmark and a Vision-Language Framework. [paper] [code]

About

About cross-/multi-lingual cross-modal retrieval

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published