Text-Video Retrieval with Global-Local Semantic Consistent Learning

Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Yihang Duan, Xinyu Lyu, Heng Tao Shen,

This is the code implementation of the paper "Text-Video Retrieval with Global-Local Semantic Consistent Learning", the checkpoint and feature will be released soon.

🔥Updates

Release the pre-trained weight and datasets.
Release the training and evaluation code.

✨Overview

Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, leading to inefficient retrieval. To address this, we propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL), which capitalizes on latent shared semantics across modalities for text-video retrieval. Specifically, we introduce a parameter-free global interaction module to explore coarse-grained alignment. Then, we devise a shared local interaction module that employs several learnable queries to capture latent semantic concepts for learning fine-grained alignment. Moreover, we propose an inter-consistency loss and an intra-diversity loss to ensure the similarity and diversity of these concepts across and within modalities, respectively.

Figure 1. Performance comparison of the retrieval results (R@1) and computational costs (FLOPs) for text-to-video retrieval models.

🍀Method

Overview of the proposed GLSCL for text-video retrieval. It comprises two main components: (1) Global Interaction Module (GIM) captures coarse-level semantic information among text and video data without involving trainable parameters, and (2) Local Interaction Module (LIM) achieves fine-grained alignment within a shared latent semantic space via several lightweight queries. Furthermore, we introduce an inter-consistency loss and an intra-diversity loss to guarantee consistency and diversity of the shared semantics across and within modalities, respectively.

Figure 2. Overview of the proposed GLSCL for Text-Video retrieval.

🧪Experiments

TODO

📚 Citation

@inproceedings{GLSCL,
  author    = {Haonan Zhang and
              Pengpeng Zeng and
              Lianli Gao and
              Jingkuan Song and
              Yihang Duan and
              Xinyu Lyu and
              Hengtao Sheng
            },
  title     = {Text-Video Retrieval with Global-Local Semantic Consistent Learning},
  booktitle = {Arxiv},
  year      = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.vscode		.vscode
imgs		imgs
preprocess		preprocess
tvr		tvr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
a.py		a.py
draw.ipynb		draw.ipynb
main_retrieval.py		main_retrieval.py
query_vis.py		query_vis.py
query_vis.sh		query_vis.sh
requirements.txt		requirements.txt
test_activitynet.sh		test_activitynet.sh
test_didemo.sh		test_didemo.sh
test_lsmdc.sh		test_lsmdc.sh
test_msrvtt.sh		test_msrvtt.sh
test_msvd.sh		test_msvd.sh
train_activitynet.sh		train_activitynet.sh
train_didemo.sh		train_didemo.sh
train_lsmdc.sh		train_lsmdc.sh
train_msrvtt.sh		train_msrvtt.sh
train_msvd.sh		train_msvd.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Video Retrieval with Global-Local Semantic Consistent Learning

Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Yihang Duan, Xinyu Lyu, Heng Tao Shen,

🔥Updates

✨Overview

🍀Method

🧪Experiments

📚 Citation

About

Releases

Packages

Languages

License

zchoi/GLSCL

Folders and files

Latest commit

History

Repository files navigation

Text-Video Retrieval with Global-Local Semantic Consistent Learning

Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Yihang Duan, Xinyu Lyu, Heng Tao Shen,

🔥Updates

✨Overview

🍀Method

🧪Experiments

📚 Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages