The dataset excludes VG_100K is provided here, which includes
├─ vg_stats.pt # frequency bias
├─ zeroshot_triplet.pytorch
├─ stanford_filtered/
├─ image_data.json
├─ VG-SGG-dicts.json # add split_GLIPunseen
├─ VG-SGG.h5
download vg150 following Scene-Graph-Benchmark.pytorch
The following is adapted from [Danfei Xu](https://github.com/danfeiX/scene-graph-TF-release/blob/master/data_tools/README.md) and [neural-motifs](https://github.com/rowanz/neural-motifs).
Note that our codebase intends to support attribute-head too, so our ```VG-SGG.h5``` and ```VG-SGG-dicts.json``` are different with their original versions in [Danfei Xu](https://github.com/danfeiX/scene-graph-TF-release/blob/master/data_tools/README.md) and [neural-motifs](https://github.com/rowanz/neural-motifs). We add attribute information and rename them to be ```VG-SGG-with-attri.h5``` and ```VG-SGG-dicts-with-attri.json```. The code we use to generate them is located at ```datasets/vg/generate_attribute_labels.py```. Although, we encourage later researchers to explore the value of attribute features, in our paper "Unbiased Scene Graph Generation from Biased Training", we follow the conventional setting to turn off the attribute head in both detector pretraining part and relationship prediction part for fair comparison, so does the default setting of this codebase.
### Download:
1. Download the VG images [part1 (9 Gb)](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip) [part2 (5 Gb)](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip). Extract these images to the file `datasets/vg/VG_100K`. If you want to use other directory, please link it in `DATASETS['VG_stanford_filtered']['img_dir']` of `maskrcnn_benchmark/config/paths_catelog.py`.
2. Download the [scene graphs](https://1drv.ms/u/s!AmRLLNf6bzcir8xf9oC3eNWlVMTRDw?e=63t7Ed) and extract them to `datasets/vg/VG-SGG-with-attri.h5`, or you can edit the path in `DATASETS['VG_stanford_filtered_with_attribute']['roidb_file']` of `maskrcnn_benchmark/config/paths_catalog.py`.
Since GroundingDINO pre-training has seen part of VG150 test images, we remove these images and generate a new split split_GLIPunseen
as VS3 did (please refer to tools/cleaned_split_GLIPunseen.ipynb).
The dataset is organized as
data/
│
└─ visual_genome/
├─ VG_100K/
├─ vg_stats.pt # frequency bias
├─ zeroshot_triplet.pytorch
├─ stanford_filtered/
├─ image_data.json
├─ VG-SGG-dicts.json # add split_GLIPunseen
├─ VG-SGG.h5
helpful links:
- zeroshot_triplet.pytorch;
- vg_stats.pt is generated by https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/blob/master/maskrcnn_benchmark/data/build.py#L21
Download the original COCO data into data
folder.
The dataset is organized as
data/
│
└─ coco/
├─ train2017
├─ val2017
├─ annotations
├─ instances_train2017.json
├─ instances_val2017.json
├─ captions_train2017.json
├─ captions_val2017.json
├─ captions_train2017_triple.json # generated by sg parser
├─ captions_val2017_triple.json # generated by sg parser
you can download captions_train2017_triple.json , captions_val2017_triple.json, coco_nouns.txt, and coco_relations.txt from huggingface.