Skip to content

Code for COLING 2022 paper "How to Adapt Pre-trained Vision-and-Language Models to a Text-only Input?".

License

Notifications You must be signed in to change notification settings

tejas-jambhale/adapt-pre-trained-VL-models-to-text-exten

 
 

Repository files navigation

Adapt Pre-trained Vision-and-Language Models to a Text-only Input

Setup environment:

conda create -n vl python=3.8

conda activate vl

pip install torch==1.8.1 && pip install  -r requirements.txt && pip install protobuf==3.20.*

Download models' weights:

sh download_model_weights.sh

Download datasets:

sh download_data_lxmert.sh

sh download_data_wikipedia.sh

Original README:

This repo contains the code for the COLING 2022 paper How to Adapt Pre-trained Vision-and-Language Models to a Text-only Input?.

Where to start?

This repo is segmented into five main parts:

  1. data contains
    • Wikipedia and LXMERT data necessary for training BERT baselines and for making the text-only adaptations that depend on LXMERT or Wikipedia data,
    • An analysis of these datasets, and calculations used to make the dataset sizes equal in number of tokens, and
    • The code necessary for downloading and formatting the Wikipedia and LXMERT data sets necessary for the project.
  2. models contains
    • Code necessary for attaining the models that haven't already been pre-trained and released. These are:
      • The BERT baselines trained on visual copora (trained-LXMERT, trained-LXMERT-scratch and trained-Wikipedia),
      • CLIP-BERT in general
    • Model weights used for all evaluations in the project, or ways to acquire them.
  3. adaptations contains
    • Code for implementing the different text-only adaptations,
    • The visual features used for the avg-visual-features, zero-image-visual-features, zeroed-visual-features, finetuned-LXMERT-visual-features and finetuned-Wikipedia-visual-features adaptations.
    • The model weights for the models that have been adapted through text-only fine-tuning (no-visual-features-finetuned-LXMERT and no-visual-features-finetuned-Wikipedia)
  4. GLUE contains code necessary for the GLUE evaluation performed in the project.
  5. visual_property_norms contains code necessary for running the Visual Property Norms evaluation.

Parts 4 and 5 essentially make up the results of the paper.

Results

Acknowledgements

This work wouldn't be possible without Huggingface and the LXMERT repo, we thank you for your work.

About

Code for COLING 2022 paper "How to Adapt Pre-trained Vision-and-Language Models to a Text-only Input?".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 43.3%
  • Jupyter Notebook 36.3%
  • Shell 19.3%
  • Makefile 1.1%