ECE 6504 Deep Learning for Perception

Homework 4

In this homework, we will implement an LSTM model for Visual Question Answering. This homework is based on the papers, Exploring Models and Data for Image Question Answering and VQA: Visual Question Answering. You are free to use any Torch library. We recommend using Element-Research/rnn.

The dataset provided consists of three types of questions: 'what color','what is on the' and 'what sport is'. The data is processed and provided in a torch readable format in data_HW4.t7.

Download the starter code here.

Q1: Blind QA model (20 points)

In this part you need to implement a blind model i.e. the model does not have inputs from the image. Fill in the model architecture details in train_qa.lua.

Q2: VQA model (15 points)

In this part, you need to augment the blind model to take inputs from the image. The fc7 features of the images can be downloaded here. The fc7 features are in lua table format. Each feature vector is referenced against it's image id. Fill in the model architecture details in train_vqa.lua.

Q3: Try something extra (Up to 15 points)

This part is for you to implement something extra. Some pointers:

VQA for other question types. Code to prepare the data is in fetchQA.py and fetchData.lua
different architectures for VQA

Deliverables

Zip containing the following:
- Completed files: train_qa.lua and train_vqa.lua
- Results for Q1 and Q2 on the test set - data_HW4_test
- Code for Q3
- README with results of all the parts and a brief explanation of Q3

References:

Exploring Models and Data for Image Question Answering, Ren et al., NIPS15
VQA: Visual Question Answering, Antol et al., ICCV15

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
csv_read.lua		csv_read.lua
data_HW4.t7		data_HW4.t7
data_HW4_test.t7		data_HW4_test.t7
fetchData.lua		fetchData.lua
fetchQA.py		fetchQA.py
hw4_utils.lua		hw4_utils.lua
test_vqa.lua		test_vqa.lua
train_qa.lua		train_qa.lua
train_vqa.lua		train_vqa.lua
vqa_test.txt		vqa_test.txt
vqa_train.txt		vqa_train.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECE 6504 Deep Learning for Perception

Homework 4

Q1: Blind QA model (20 points)

Q2: VQA model (15 points)

Q3: Try something extra (Up to 15 points)

About

Releases 1

Packages

Contributors 2

batra-mlp-lab/VT-F15-ECE6504-HW4

Folders and files

Latest commit

History

Repository files navigation

ECE 6504 Deep Learning for Perception

Homework 4

Q1: Blind QA model (20 points)

Q2: VQA model (15 points)

Q3: Try something extra (Up to 15 points)

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Packages