Prepare the datasets before running experiments.
The project directory is $ROOT
,and current directory is located at $ROOT/data
to generate annotations.
- Download the cleaned referring expressions datasets and extract them into
$ROOT/data
folder:
- Prepare mscoco train2014 images, original Flickr30K images, ReferItGame images and Visual Genome images, and unzip the annotations. Then the file structure should look like:
$ROOT/data
|-- refcoco
|-- instances.json
|-- refs(google).p
|-- refs(unc).p
|-- refcoco+
|-- instances.json
|-- refs(unc).p
|-- refcocog
|-- instances.json
|-- refs(google).p
|-- refs(umd).p
|-- refclef
|-- instances.json
|-- refs(berkeley).p
|-- refs(unc).p
|-- images
|-- train2014
|-- refclef
|-- flickr
|-- VG
- Run data_process.py to generate the annotations. For example, running the following code to generate the annotations for RefCOCO:
cd $ROOT/data
python data_process.py --data_root $ROOT/data --output_dir $ROOT/data --dataset refcoco --split unc --generate_mask
--dataset={'refcoco', 'refcoco+', 'refcocog', 'refclef'}
to set the dataset to be processd.
For Flickr and merged pre-training data, we provide the pre-processed json files: flickr.json, merge.json.
Note: The merged pre-training data contains the training data from RefCOCO train, RefCOCO+ train, RefCOCOg train, Referit train, Flickr train and VG. We also remove the images appearing the validation and testing set of RefCOCO, RefCOCO+ and RefCOCOg.
- At this point the directory
$ROOT/data
should look like:
$ROOT/data
|-- refcoco
|-- instances.json
|-- refs(google).p
|-- refs(unc).p
|-- refcoco+
|-- instances.json
|-- refs(unc).p
|-- refcocog
|-- instances.json
|-- refs(google).p
|-- refs(umd).p
|-- anns
|-- refcoco
|-- refcoco.json
|-- refcoco+
|-- refcoco+.json
|-- refcocog
|-- refcocog.json
|-- refclef
|-- refclef.json
|-- flickr
|-- flickr.json
|-- merge
|-- merge.json
|-- masks
|-- refcoco
|-- refcoco+
|-- refcocog
|-- refclef
|-- images
|-- train2014
|-- refclef
|-- flickr
|-- VG
|-- weights
|-- pretrained_weights
We provide the pretrained weights of visual backbones on MS-COCO. We remove all images appearing in the val+test splits of RefCOCO, RefCOCO+ and RefCOCOg. Please download the following weights into $ROOT/data/weights
.
Pretrained Weights of Backbone | Link |
---|---|
DarkNet53-coco | OneDrive , Baidu Cloud |
CSPDarkNet-coco | OneDrive , Baidu Cloud |
Vgg16-coco | OneDrive , Baidu Cloud |
DResNet101-voc | OneDrive , Baidu Cloud |
We also provide the weights of SimREC that are pretrained on 0.2M images.
Pretrained Weights of REC Models | Link |
---|---|
SimREC (merge) | OneDrive , Baidu Cloud |