Data Setup

We provide access to our preprocessed data (including extracted features) and preprocessing scripts to replicate our setup.

Preprocessed Data

Conceptual Captions
Flickr30k
GQA
MS COCO
NLVR2
RefCOCO (UNC)
RefCOCO+ (UNC)
RefCOCOg (UMD)
SNLI-VE
VQAv2

More recent (from IGLUE) and with more backbones:

Flickr30K
GQA
MaRVL zero-shot | MaRVL few-shot
NLVR2
xFlickr&CO
WIT

NB: I have noticed that uploading LMDB files made their size grow to the order of TBs. So, instead, I recently uploaded the H5 versions that can quickly be converted to LMDB locally using this script.

Preprocessing Steps

I originally relied on Hao Tan's airsplay/bottom-up-attention Docker image to extract image features from Faster R-CNN. For more details about the Docker image, see the LXMERT repository.

Recently, I have switched to Hao Tan's Detectron2 implementation of 'Bottom-up feature extractor', which is compatible with the original Caffe implementation. See here for step-by-step instructions.

Moreover, it is possible to extract Faster R-CNN features with a ResNeXt-101 backbone from the mmf repository following these instructions.

For detailed preprocessing procedures, check out the README files for each data set in this folder or under feature_extraction/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data Setup

Preprocessed Data

Preprocessing Steps

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data Setup

Preprocessed Data

Preprocessing Steps