We present Language-binding Object Graph Network (LOGNet), the first neural reasoning method with dynamic relational structures across both visual and textual domains with applications in visual question answering. LOGNet offers an effective way to automatically identify word-object affiliations and form object relations in the context of a given question.
Illustration of a LOG unit:
Check out our paper for details.
-
Clone the repository:
git clone https://github.com/thaolmk54/LOGNet-VQA.git
-
Download CLEVR, CLEVR-Human and GQA dataset and edit corresponding paths in the repo upon where you locate your data.
-
Install dependencies:
conda create -n lognet_vqa python=3.6
conda activate lognet_vqa
pip install -r requirements.txt
We adapt the well-known Faster RCNN repo implemented in PyTorch and train it on Visual Genome and finally use a pretrained Faster R-CNN model for visual feature extraction. Adapted code and pretrained detection model on Visual Genome are available upon request.
Download glove pretrained 300d word vectors to preprocess_glove/
then unzip and process it into a pickle file:
python txt2pickle.py
Finally, save the output file glove.840.300d.pkl
somewhere for later use.
Experiments on CLEVR are done within exp_clevr/
. Please edit absolute paths in the configuration file at exp_clevr/configs/clevr.yml
before running the commands.
You can download our pre-extracted features for CLEVR dataset here and save them in exp_clevr/data/
.
Extract PyTorch.pt
files by running the following commands:
python preprocess/preprocess_questions.py --mode train --cfg configs/clevr.yml
python preprocess/preprocess_questions.py --mode val --cfg configs/clevr.yml
python preprocess/preprocess_questions.py --mode test --cfg configs/clevr.yml
Choose a suitable config file in configs/
if you wish to train the network with full dataset, 20% of training data or 10% of the training data. For example, to train with all training samples, run the following command:
python train.py --cfg configs/clevr.yml
To evaluate the trained model, run the following:
python validate.py --cfg configs/clevr.yml --mode val
Note: A pretrained model on 10% of training data is available here. Save the file in results/expClevr10%LOGNet/ckpt/
for evaluation.
Experiments on CLEVR-Human are done within exp_clevr_human/
. Please edit absolute paths in the configuration file at exp_clevr_human/configs/clevr_human.yml
before running the commands.
Experiments on CLEVR-Human use the same visual features as CLEVR dataset.
Extract PyTorch.pt
files by running the following commands:
python preprocess/preprocess_questions.py --mode train --cfg configs/clevr_human.yml
python preprocess/preprocess_questions.py --mode val --cfg configs/clevr_human.yml
python preprocess/preprocess_questions.py --mode test --cfg configs/clevr_human.yml
python train.py --cfg configs/clevr_human.yml
To evaluate the trained model, run the following:
python validate.py --cfg configs/clevr_human.yml --mode val
Experiments on GQA dataset are done within exp_gqa/
. Please edit absolute paths in the configuration file at exp_gqa/configs/gqa.yml
before running the commands.
Download object features and spatial features for GQA dataset here and save them in exp_gqa/data/
. We adapt the following script to merge h5 chunk files together:
python preprocess/merge.py --name objects
python preprocess/merge.py --name spatial
This should return two output files for each feature type: gqa_objects.h5
/gqa_spatial.h5
and gqa_objects_merged_info.json
/gqa_spatial_merged_info.json
.
Extract PyTorch.pt
files by running the following commands:
python preprocess/preprocess_questions.py --mode train --cfg configs/gqa.yml
python preprocess/preprocess_questions.py --mode val --cfg configs/gqa.yml
python preprocess/preprocess_questions.py --mode test --cfg configs/gqa.yml
Choose a suitable config file in configs/
if you wish to train the network with full dataset or 20% of training data. For example, to train with all training samples, run the following command:
python train.py --cfg configs/gqa.yml
To evaluate the trained model, run the following:
python validate.py --cfg configs/gqa.yml --mode val
If you make use of this repository for your research, please star this repo and cite the following paper:
@inproceedings{ijcai2020-114,
title = {Dynamic Language Binding in Relational Visual Reasoning},
author = {Minh Le, Thao and Le, Vuong and Venkatesh, Svetha and Tran, Truyen},
booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
Artificial Intelligence, {IJCAI-20}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
editor = {Christian Bessiere}
pages = {818--824},
year = {2020},
month = {7},
note = {Main track}
doi = {10.24963/ijcai.2020/114},
url = {https://doi.org/10.24963/ijcai.2020/114},
}