Assist Non-native Viewers: Multimodal Cross-Lingual Summarization for How2 Videos

The original conference version was accepted by EMNLP 2022, and the extended journal version has been accepted by TPAMI.

Data Preparing

The reorganized How2-MCLS text data can be downloaded from here [Baidu Netdisk, Passcode: a9df], as well as video features [Baidu Netdisk, Passcode: eqqj] (derived from the original How2 dataset). The original How2 dataset for multimodal summarization is provided by https://github.com/srvk/how2-dataset.

Preprocessing

Some demo data is placed in "data/demo_data" folder, and you can replace the demo data with the full How2-MCLS dataset, following the format of "data/demo_data" folder. Then run the following command to preprocess the data. This code takes the Pt2En scenario as an example for demonstration.

python preprocess.py #Please modify the data storage path configuration.

Training and Prediction

After data preprocessing, you can run the following script commands to execute the training and prediction procedures of the proposed models.

VDF

bash run_scripts/VDF.sh

VDF-TS-E

bash run_scripts/VDF-TS-E.sh

VDF-TS-V

bash run_scripts/VDF-TS-V.sh

VDF-TS-E2, using language-adaptive warping distillation (LAWD) to replace adaptive pooling distillation.

bash run_scripts/VDF-TS-E2.sh

VDF-TS-V2, using LAWD to replace adaptive pooling distillation.

bash run_scripts/VDF-TS-V2.sh

Evaluation

nmtpytorch library is used to evaluate models, which includes BLEU (1, 2, 3, 4), ROUGE-L, METEOR, and CIDEr evaluation metrics.

As an alternative, nlg-eval evaluation library can obtain the same evaluation scores as nmtpytorch.

In addition, ROUGE evaluation library is used to calculate the ROUGE (1, 2, L) score.

Acknowledgement

We are very grateful that the code is based on MFN, nmtpytorch, fairseq, machine-translation, pytorch-softdtw-cuda, and Transformers.

Citation

@inproceedings{liu2022assist,
  title={Assist non-native viewers: Multimodal cross-lingual summarization for how2 videos},
  author={Liu, Nayu and Wei, Kaiwen and Sun, Xian and Yu, Hongfeng and Yao, Fanglong and Jin, Li and Zhi, Guo and Xu, Guangluan},
  booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing},
  pages={6959--6969},
  year={2022}
}
@article{liu2024multimodal,
  title={Multimodal Cross-lingual Summarization for Videos: A Revisit in Knowledge Distillation Induced Triple-stage Training Method},
  author={Liu, Nayu and Wei, Kaiwen and Yang, Yong and Tao, Jianhua and Sun, Xian and Yao, Fanglong and Yu, Hongfeng and Jin, Li and Lv, Zhao and Fan, Cunhang},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024},
  note = {Early Access},
  publisher={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data-bin/how2mcls		data-bin/how2mcls
data/demo_data		data/demo_data
run_scripts		run_scripts
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assist Non-native Viewers: Multimodal Cross-Lingual Summarization for How2 Videos

Data Preparing

Preprocessing

Training and Prediction

Evaluation

Acknowledgement

Citation

About

Releases

Packages

Languages

korokes/MCLS

Folders and files

Latest commit

History

Repository files navigation

Assist Non-native Viewers: Multimodal Cross-Lingual Summarization for How2 Videos

Data Preparing

Preprocessing

Training and Prediction

Evaluation

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages