Skip to content

Latest commit

 

History

History
63 lines (55 loc) · 4.21 KB

data.md

File metadata and controls

63 lines (55 loc) · 4.21 KB

Dataset

VG150

The dataset excludes VG_100K is provided here, which includes

    ├─ vg_stats.pt # frequency bias 
    ├─ zeroshot_triplet.pytorch
    ├─ stanford_filtered/
        ├─ image_data.json
        ├─ VG-SGG-dicts.json # add split_GLIPunseen 
        ├─ VG-SGG.h5

steps

download vg150 following Scene-Graph-Benchmark.pytorch

The following is adapted from [Danfei Xu](https://github.com/danfeiX/scene-graph-TF-release/blob/master/data_tools/README.md) and [neural-motifs](https://github.com/rowanz/neural-motifs).

Note that our codebase intends to support attribute-head too, so our ```VG-SGG.h5``` and ```VG-SGG-dicts.json``` are different with their original versions in [Danfei Xu](https://github.com/danfeiX/scene-graph-TF-release/blob/master/data_tools/README.md) and [neural-motifs](https://github.com/rowanz/neural-motifs). We add attribute information and rename them to be ```VG-SGG-with-attri.h5``` and ```VG-SGG-dicts-with-attri.json```. The code we use to generate them is located at ```datasets/vg/generate_attribute_labels.py```. Although, we encourage later researchers to explore the value of attribute features, in our paper "Unbiased Scene Graph Generation from Biased Training", we follow the conventional setting to turn off the attribute head in both detector pretraining part and relationship prediction part for fair comparison, so does the default setting of this codebase.

### Download:
1. Download the VG images [part1 (9 Gb)](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip) [part2 (5 Gb)](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip). Extract these images to the file `datasets/vg/VG_100K`. If you want to use other directory, please link it in `DATASETS['VG_stanford_filtered']['img_dir']` of `maskrcnn_benchmark/config/paths_catelog.py`. 
2. Download the [scene graphs](https://1drv.ms/u/s!AmRLLNf6bzcir8xf9oC3eNWlVMTRDw?e=63t7Ed) and extract them to `datasets/vg/VG-SGG-with-attri.h5`, or you can edit the path in `DATASETS['VG_stanford_filtered_with_attribute']['roidb_file']` of `maskrcnn_benchmark/config/paths_catalog.py`.

Since GroundingDINO pre-training has seen part of VG150 test images, we remove these images and generate a new split split_GLIPunseen as VS3 did (please refer to tools/cleaned_split_GLIPunseen.ipynb).

The dataset is organized as

data/
│
└─ visual_genome/
    ├─ VG_100K/
    ├─ vg_stats.pt # frequency bias 
    ├─ zeroshot_triplet.pytorch
    ├─ stanford_filtered/
        ├─ image_data.json
        ├─ VG-SGG-dicts.json # add split_GLIPunseen 
        ├─ VG-SGG.h5

helpful links:

COCO

Download the original COCO data into data folder. The dataset is organized as

data/
│
└─ coco/
    ├─ train2017
    ├─ val2017
    ├─ annotations
        ├─ instances_train2017.json
        ├─ instances_val2017.json
        ├─ captions_train2017.json
        ├─ captions_val2017.json 
        ├─ captions_train2017_triple.json # generated by sg parser
        ├─ captions_val2017_triple.json  # generated by sg parser    

you can download captions_train2017_triple.json , captions_val2017_triple.json, coco_nouns.txt, and coco_relations.txt from huggingface.