Multimodal Pretraining Unmasked

This is the implementation of the approaches described in the paper:

Emanuele Bugliarello, Ryan Cotterell, Naoaki Okazaki and Desmond Elliott. Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs. Transactions of the Association for Computational Linguistics, 2021.

We provide the code for reproducing our results, as well as log files. Preprocessed data and pretrained models are also available in VOLTA.

NB: This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication. During cluster maintenance, a small portion of data preparation and log files have been lost. Nevertheless, this repository contains the core software to reproduce our results. The missing data preparation files were derived from the official repositories of LXMERT, ViLBERT and VL-BERT, available under code/.

Requirements

You can clone this repository issuing:
git clone git@github.com:e-bug/mpre-unmasked

The Python environments for each code base (LXMERT, ViLBERT, VL-BERT, VOLTA) can be installed from the corresponding directories in code/.

Data

Check out data/ for download and preprocessing steps. A clean, step-by-step version and preprocessed features are available in VOLTA.

Models

Check out MODELS.md in VOLTA for links to pretrained models.

Training and Evaluation

We provide our scripts to train (i.e. pretrain or fine-tune) and evaluate models in experiments/. These include ViLBERT, LXMERT and VL-BERT using the official repositories, as well as ViLBERT, LXMERT, VL-BERT, VisualBERT and UNITER using VOLTA.

License

This work is licensed under the MIT license. See LICENSE for details. Third-party software and data sets are subject to their respective licenses.
If you find our code/data/models or ideas useful in your research, please consider citing the paper:

@article{bugliarello-etal-2021-multimodal,
    title = "Multimodal Pretraining Unmasked: {A} Meta-Analysis and a Unified Framework of Vision-and-Language {BERT}s",
    author = "Bugliarello, Emanuele and
      Cotterell, Ryan and
      Okazaki, Naoaki and
      Elliott, Desmond",
    journal = "Transactions of the Association for Computational Linguistics",
    year = "2021",
    url = "https://arxiv.org/abs/2011.15124",
}

Acknowledgement

Our codebase heavily relies on these excellent repositories:

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
code		code
data		data
experiments		experiments
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Pretraining Unmasked

Requirements

Data

Models

Training and Evaluation

License

Acknowledgement

About

Releases

Packages

Languages

License

e-bug/mpre-unmasked

Folders and files

Latest commit

History

Repository files navigation

Multimodal Pretraining Unmasked

Requirements

Data

Models

Training and Evaluation

License

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages