MIRTT

This repository is the implementation of MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering.

Data Source

VQA2.0, GQA: LXMERT

visual7w, TDIUC: CTI

VQA1.0: VQA web

Pretrain

Under ./pretrain:

bash run.bash exp_name gpuid

Some parameters can be changed in run.bash.

MC VQA

Under ./main:

bash run.bash exp_name gpuid

FFOE VQA

Two stage workflow

Stage one: bilinear model (BAN, SAN, MLP)

Under ./bilinear_method:

bash run.bash exp_name gpuid mod dataset model

After training, we can generate answer list for each dataset. In this way, we simplify FFOE VQA into MC VQA.

Stage two: MIRTT. Under ./main

keep updating

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
bilinear_method		bilinear_method
main		main
pretrain		pretrain
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIRTT

Data Source

Pretrain

MC VQA

FFOE VQA

About

Languages

IIGROUP/MIRTT

Folders and files

Latest commit

History

Repository files navigation

MIRTT

Data Source

Pretrain

MC VQA

FFOE VQA

About

Resources

Stars

Watchers

Forks

Languages