root
├── dataset
│ ├── COCO
│ │ │── captions_train2017.json
│ │ │── captions_val2017.json
│ │ │── COCO_triplet_labels.npy
│ │ └── images
| │ └── *.png
│ ├── VG
│ │ │── image_data.json
│ │ │── VG-SGG-with-attri.h5
│ │ │── VG-SGG-dicts-with-attri.json
│ │ └── images
│ │ └── *.png
│ ├── GQA
│ │ │── GQA_200_ID_Info.json
│ │ │── GQA_200_Train.json
│ │ │── GQA_200_Test.json
│ │ └── images
│ │ └── *.png
│ ├── CC
│ │ │── Train_GCC-training.tsv
│ │ │── cc_triplet_labels.npy
│ │ │── cc_meta_information.json
│ │ └── images
│ │ └── *.png
│ ├── VG_Caption
│ │ └── region_descriptions.json
We use training datasets for COCO
, CC
, and Visual Genome caption
datasets.
You can conveniently download each set of training data using shell code. For detailed information, please refer to each following links.
# DATASET: COCO, CC, VG_Caption
bash dataset/{DATASET}/download.sh dataset/{DATASET}
For evaluation, we use Visual Genome (VG)
and GQA
datasets.
# DATASET: VG, GQA
bash dataset/{DATASET}/download.sh dataset/{DATASET}