visual_question_answering

Pytorch implementation of the following papers

VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf).
Stacked Attention Networks for Image Question Answering (https://arxiv.org/abs/1511.02274)
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering (https://arxiv.org/abs/1612.00837)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering (https://arxiv.org/abs/1707.07998)

Directory and File Structure

.
+-- COCO-2015/
|   +-- images/ (link of /dataset/COCO2015 from server (using ln -s))
|       +-- train2014/
|       +-- ...
|   +-- resized_images/
|       +-- train2014/
|       +-- ...
|       +-- Questions/
|       +-- Annotations/
|       +-- train.npy
|       +-- ...
|       +-- vocab_questions.txt
|       +-- vocab_answers.txt
|   +-- <questions>.json
|   +-- <annotations>.json
+-- vqa
|   +-- .git
|   +-- README.md

Usage

1. Clone the repositories.

$ git clone https://github.com/SatyamGaba/visual_question_answering.git

2. Download and unzip the dataset from official url of VQA: https://visualqa.org/download.html.

We have used VQA2 in for this project

$ cd visual_question_answering/utils
$ chmod +x download_and_unzip_datasets.csh
$ ./download_and_unzip_datasets.csh

3. Preproccess input data for (images, questions and answers).

$ python resize_images.py --input_dir='../COCO-2015/Images' --output_dir='../COCO-2015/Resized_Images'  
$ python make_vacabs_for_questions_answers.py --input_dir='../COCO-2015'
$ python build_vqa_inputs.py --input_dir='../COCO-2015' --output_dir='../COCO-2015'

4. Train model for VQA task.

$ cd ..
$ python train.py --model_name="<name to save logs>" --resume_epoch="<epoch number to resume from>" --saved_model="<saved model if resume training>"

5. Plotting.

Rename model_name variable in plotter.py

$ python plotter.py

6. Infer the trained model on an Image.

$ python test.py --saved_model="<path to model>" --image_path="<path to image>" --question="<ask question here>"

References

Paper implementation
- Keywords: Visual Question Answering ; Simple Attention; Stacked Attention; Top-Down Attention;
Baseline Model
- Github: https://github.com/tbmoon/basic_vqa

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
png		png
presentation		presentation
tutorials		tutorials
utilities		utilities
.gitignore		.gitignore
README.md		README.md
data_loader.py		data_loader.py
models.py		models.py
plotter.py		plotter.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

visual_question_answering

Directory and File Structure

Usage

1. Clone the repositories.

2. Download and unzip the dataset from official url of VQA: https://visualqa.org/download.html.

3. Preproccess input data for (images, questions and answers).

4. Train model for VQA task.

5. Plotting.

6. Infer the trained model on an Image.

References

About

Releases

Packages

Languages

loserlulin9/miniVQA

Folders and files

Latest commit

History

Repository files navigation

visual_question_answering

Directory and File Structure

Usage

1. Clone the repositories.

2. Download and unzip the dataset from official url of VQA: https://visualqa.org/download.html.

3. Preproccess input data for (images, questions and answers).

4. Train model for VQA task.

5. Plotting.

6. Infer the trained model on an Image.

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages