Skip to content

Comparing different Image-Text Attention models for VQA Task

Notifications You must be signed in to change notification settings

adhiraj2001/VQA

Repository files navigation

VQA using Differential Attention Models

Pytorch implementation of the papers:

model

Usage

1. Clone the repositories.

git clone https://github.com/chirag26495/DAN_VQA.git

2. Download and unzip the dataset from official url of VQA: https://visualqa.org/download.html.

cd basic_vqa/utils
chmod +x download_and_unzip_datasets.csh
./download_and_unzip_datasets.csh

3. Preproccess input data for (images, questions and answers).

$ python resize_images.py --input_dir='../datasets/Images' --output_dir='../datasets/Resized_Images'  
$ python make_vacabs_for_questions_answers.py --input_dir='../datasets'
$ python build_vqa_inputs.py --input_dir='../datasets' --output_dir='../datasets'

4. Train model for VQA task.

$ cd ..
$ python train.py

Pretrained Models and Exemplar Mappings (using VQA2.0 dataset)

Results

  • Quantitative comparison on VQA2.0 Validation set
Model Metric Dataset Accuracy
Basic (LQI) All VQA v2 47.61
Baseline (LQIA) All VQA v2 53.23
SAN-2 All VQA v2 55.28
DAN + LQIA All VQA v2 55.49
DAN-alt. + LQIA All VQA v2 54.16

About

Comparing different Image-Text Attention models for VQA Task

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published