We provide access to our preprocessed data (including extracted features) and preprocessing scripts to replicate our setup.
- Conceptual Captions
- Flickr30k
- GQA
- MS COCO
- NLVR2
- RefCOCO (UNC)
- RefCOCO+ (UNC)
- RefCOCOg (UMD)
- SNLI-VE
- VQAv2
More recent (from IGLUE) and with more backbones:
NB: I have noticed that uploading LMDB files made their size grow to the order of TBs. So, instead, I recently uploaded the H5 versions that can quickly be converted to LMDB locally using this script.
I originally relied on Hao Tan's airsplay/bottom-up-attention
Docker image to extract image features from Faster R-CNN.
For more details about the Docker image, see the LXMERT repository.
Recently, I have switched to Hao Tan's Detectron2 implementation of 'Bottom-up feature extractor', which is compatible with the original Caffe implementation. See here for step-by-step instructions.
Moreover, it is possible to extract Faster R-CNN features with a ResNeXt-101 backbone from the mmf
repository following these instructions.
For detailed preprocessing procedures, check out the README files for each data set in this folder or under feature_extraction/
.