MMVideoTextRetrieval is an open source video-text retrieval toolbox based on PyTorch.
This repository provides different video text retrieval methods.
-
Modular design
We decompose the video-text retrieval framework into different components which can be easily used any combination.
-
Support for various datasets and features
The toolbox supports multiple datasets, such as MSRVTT, ActivityNet, LSMDC. Besides, various extracted features are provided.
-
Support for multiple video text retrieval frameworks
MMVideoTextRetrieval implements popular frameworks for video text retrieval, such as MMT, etc. More frameworks will be added later.
-
Visual demo
We provide the demo to visualize the results of video text retrieval models.
We provide a way to produce text-to-video retrieval in real-world applications. Before retrieval, the multi-model features of videos should be extracted and stored. The searched text is defined in the "main_train" function in demo.py, and the config "--sentence" should be used to activate the retrieval process. The outputs of the retrieval are the name of video feature files of the top 10 similar videos.
Model | Dataset | Video Feature | Text Feature | Pretrained | Text-to-Video Retrieval | Video-to-Text Retrieval | ||||
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | |||||
MMT | MSTVTT-1kA | S3D | Bert | no | 24.6 | 54 | 67.1 | 24.4 | 56 | 67.8 |
MMT | ActivityNet | S3D | Bert | no | 22.7 | 54.2 | 93.2 | 22.9 | 54.8 | 93.1 |
MMT | LSMDC | S3D | Bert | no | 13.2 | 29.2 | 38.8 | 12.1 | 29.3 | 37.9 |
MMT | MSTVTT-1kA&B | S3D | Bert | HowTo100M | 26.6 | 57.1 | 69.6 | 27 | 57.5 | 69.7 |
MMT | ActivityNet | S3D | Bert | HowTo100M | 28.7 | 61.4 | 94.5 | 28.9 | 61.1 | 94.3 |
MMT | LSMDC | S3D | Bert | HowTo100M | 12.9 | 29.9 | 40.1 | 12.3 | 28.6 | 38.9 |
HGR | MSTVTT-Full | Resnet152 | Word2Vec | no | 9.2 | 26.2 | 36.5 | 15 | 36.7 | 48.8 |
(All the results are excerpted from the original paper and will be replaced by the results of pre-trained models later.)
supported methods for Video Text retrieval.
-
MMT (ECCV'2020)
-
MMT-modified (ICMEW'2021)
-
HGR (CVPR'2020)
supported datasets.
(click to collapse)
-
MSR-VTT
-
ActivityNet Captions
-
LSMDC
-
TGIF
-
VATEX
- Python 3.7
- Pytorch 1.4.0 +
- Transformers 3.1.0
- Numpy 1.18.1
pip install -r requirements.txt
Training + evaluation:
python -m demo --config configs/$model_name/$dataset_$split_trainval.json
Evaluation from checkpoint:
python -m demo --config configs/$model_name/$dataset_$split_trainval.json --only_eval --load_checkpoint $checkpoint_path
Training from pretrained model:
python -m demo --config configs/$model_name/prtrn_$dataset_$split_trainval.json --load_checkpoint $checkpoint_path
Retrieval videos with a specific sentence:
python -m demo --config configs/$model_name/$dataset_$split_trainval.json --only_eval --load_checkpoint $checkpoint_path --sentence
Using the modified version of MMT for training:
python -m demo --config configs/$model_name/prtrn_$dataset_$split_trainval.json --modified_model