Adapt Pre-trained Vision-and-Language Models to a Text-only Input

Setup environment:

conda create -n vl python=3.8

conda activate vl

pip install torch==1.8.1 && pip install  -r requirements.txt && pip install protobuf==3.20.*

Download models' weights:

sh download_model_weights.sh

Download datasets:

sh download_data_lxmert.sh

sh download_data_wikipedia.sh

Original README:

This repo contains the code for the COLING 2022 paper How to Adapt Pre-trained Vision-and-Language Models to a Text-only Input?.

Where to start?

This repo is segmented into five main parts:

data contains
- Wikipedia and LXMERT data necessary for training BERT baselines and for making the text-only adaptations that depend on LXMERT or Wikipedia data,
- An analysis of these datasets, and calculations used to make the dataset sizes equal in number of tokens, and
- The code necessary for downloading and formatting the Wikipedia and LXMERT data sets necessary for the project.
models contains
- Code necessary for attaining the models that haven't already been pre-trained and released. These are:
  - The BERT baselines trained on visual copora (trained-LXMERT, trained-LXMERT-scratch and trained-Wikipedia),
  - CLIP-BERT in general
- Model weights used for all evaluations in the project, or ways to acquire them.
adaptations contains
- Code for implementing the different text-only adaptations,
- The visual features used for the avg-visual-features, zero-image-visual-features, zeroed-visual-features, finetuned-LXMERT-visual-features and finetuned-Wikipedia-visual-features adaptations.
- The model weights for the models that have been adapted through text-only fine-tuning (no-visual-features-finetuned-LXMERT and no-visual-features-finetuned-Wikipedia)
GLUE contains code necessary for the GLUE evaluation performed in the project.
visual_property_norms contains code necessary for running the Visual Property Norms evaluation.

Parts 4 and 5 essentially make up the results of the paper.

Results

Acknowledgements

This work wouldn't be possible without Huggingface and the LXMERT repo, we thank you for your work.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
GLUE		GLUE
adaptations		adaptations
batch_jobs		batch_jobs
data		data
images		images
models		models
scripts		scripts
visual_property_norms		visual_property_norms
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adapt Pre-trained Vision-and-Language Models to a Text-only Input

Setup environment:

Download models' weights:

Download datasets:

Original README:

Where to start?

Results

Acknowledgements

About

Releases

Packages

Languages

License

tejas-jambhale/adapt-pre-trained-VL-models-to-text-exten

Folders and files

Latest commit

History

Repository files navigation

Adapt Pre-trained Vision-and-Language Models to a Text-only Input

Setup environment:

Download models' weights:

Download datasets:

Original README:

Where to start?

Results

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages