SPRING

🏴 Overview

we propose a Situated Conversation Agent Pre-trained with Multimodal Questions from Incremental Layout Graph (SPRING) with abilities of reasoning multi-hops spatial relationships and connecting them with visual attributes in crowded situated scenarios. Specifically, we design two types of Multimodal Question Answering (MQA) tasks to pretrain the agent. All QA pairs utilized during pretraining are generated from novel Increment Layout Graphs (ILG). QA pair difficulty labels automatically annotated by ILG are used to promote MQA-based Curriculum Learning. Experimental results verify the SPRING’s effectiveness, showing that it significantly outperforms state-of-the-art approaches on both SIMMC 1.0 and SIMMC 2.0 datasets.

🔥 News

2023.3.14: We release pretrained checkpoint of SPRING.
2023.1.5: Our paper is announced as AAAI 2023 Oral.
2022.11.18: Our paper is accepted by AAAI 2023 Main Track.

🌏 Requirements

python 3.6.12
pytorch 1.8.2
torchvision 0.9.2
nltk 3.6.7

🔨 Installation

pip install -r requirements.txt

👐 Data Preparation

Download SIMMC 2.0 Dataset

Download the SIMMC 2 dataset via git-lfs and rearrange them in the simmcdata file as following format.

|-- simmc2_scene_images_dstc10_public_part1    # Images (unzip simmc2_scene_images_dstc10_public_part1.zip)
|   |-- cloth_store_1416238_woman_1_1.png
|   |-- cloth_store_1416238_woman_1_2.png
|   `-- ...
|-- simmc2_scene_images_dstc10_public_part2    # Images (unzip simmc2_scene_images_dstc10_public_part2.zip)
|   |-- cloth_store_1_1_1.png
|   |-- cloth_store_1_1_2.png

|-- public                                     # Scene (unzip simmc2_scene_jsons_dstc10_public.zip)
|   |-- cloth_store_1_1_1_bbox.json
|   |-- cloth_store_1_1_1_scene.json
|   `-- ...
|-- item2id.json                               # converted asset ID (provided by us)
|-- fashion_prefab_metadata_all.json           # fashion metadata
|-- furniture_prefab_metadata_all.json         # furniture metadata
|-- simmc2_dials_dstc10_train.json             # dialogue train split 
|-- simmc2_dials_dstc10_dev.json               # dialogue dev slit
|-- simmc2_dials_dstc10_devtest.json           # dialogue devtest split

NOTE: Some of the scene images are corrupted and therefore ignored. We do not make use of images in our model.

./data/images/cloth_store_1416238_woman_4_8.png
./data/images/cloth_store_1416238_woman_19_0.png
./data/images/cloth_store_1416238_woman_20_6.png

The final directory structure of the whole project is

|Project
|-- SPRING
|	| -- models
|	| -- tasks
|	| -- run_scripts
|	| -- dataset
|   `-- ...
|-- simmcdata
|	| -- simmc2_scene_images_dstc10_public_part
|	| -- simmc2_scene_images_dstc10_public_par2
|	| -- item2id.json
|	| -- fashion_prefab_metadata_all.json 
|   `-- ...

Construct Incremental Layout Graphs

Move into the SPRING/dataset directory and execute generate_ILG.py by

python generate_ILG.py

Then ILGs.pkl will be generated in the SPRING/dataset directory, which is a dictionary containing all ILGs for SIMMC2. The dictionary key is the name of scene annotation file while value is corresponding ILG stored by ILG Class.

Generate Multimodal Question Answering Pairs

Continually, also in the SPRING/dataset directory, execute generate_MQA_fromILG.py by

python generate_MQA_fromILG.py

Above script reads dialogue data (responses to be evaluated are skipped) and generates visual QA pais and spatial QA pairs by transversing ILG. The generated QA pairs are stored in SPRING/dataset/simmc2_QA/simmc2_VisSpaQA.tsv file. Each row of tsv file contains (QA type, scene image, answer, question, bbox, difficulty label, task type, dataset name).

Prepare SIMMC 2.0 Dialog Data

In this step, we prepare simmc2 tsv files for training by executing generate_simmc2.py in the SPRING/dataset directory.

python generate_simmc2.py

We can get simmc2_train.tsv, simmc2_dev.tsv and simmc2_devtest.tsv after completing execution，which are the input files of SPRING. Each row of these tsv files include (ID, scene image, system response, dialog history , None, None, task type, dataset name).

🛫 Pre-training

We will open pretrain bash in the future. At present, you can download pretrained SPRING model at here and put checkpoint.pt under SPRING/run_scripts/simmc2 directory.

🚀 Fine-tuning

Move into SPRING/run_scripts/simmc2 directory and execute train_simmc2.sh. You need to specify the location of pretrained model parameters in the bash script.

bash train_simmc2.sh test 0 1

The finetuned model parameters and logs will be automatically saved under SPRING/run_scripts/simmc2/finetune/finetune_checkpoints directory.

📐 Evaluation

In the SPRING/run_scripts/simmc2 directory, you can evaluate finetued SPRING model's performance on SIMMC Response Generation task by excuting evaluate_one.sh.

bash evaluate_one.sh test 0 1

📝 License

Our repository is released under MIT License, see LICENSE for details.

✒ Citation

Please cite our paper if you find it helpful :)

@article{long2023spring,
  author    = {Yuxing Long and
               Binyuan Hui and
               Fulong Ye and
               Yanyang Li and
               Zhuoxin Han and
               Caixia Yuan and
               Yongbin Li and
               Xiaojie Wang},
  title     = {{SPRING:} Situated Conversation Agent Pretrained with Multimodal Questions
               from Incremental Layout Graph},
  journal   = {CoRR},
  volume    = {abs/2301.01949},
  year      = {2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
criterions		criterions
data		data
dataset		dataset
fairseq		fairseq
imgs		imgs
models		models
ofa_module		ofa_module
run_scripts/simmc2		run_scripts/simmc2
tasks		tasks
utils		utils
LICENSE		LICENSE
README.md		README.md
all_objects_meta.npy		all_objects_meta.npy
evaluate.py		evaluate.py
item2id.json		item2id.json
requirements.txt		requirements.txt
train.py		train.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPRING

🏴 Overview

🔥 News

🌏 Requirements

🔨 Installation

👐 Data Preparation

Download SIMMC 2.0 Dataset

Construct Incremental Layout Graphs

Generate Multimodal Question Answering Pairs

Prepare SIMMC 2.0 Dialog Data

🛫 Pre-training

🚀 Fine-tuning

📐 Evaluation

📝 License

✒ Citation

About

Releases

Packages

Contributors 2

Languages

License

LYX0501/SPRING

Folders and files

Latest commit

History

Repository files navigation

SPRING

🏴 Overview

🔥 News

🌏 Requirements

🔨 Installation

👐 Data Preparation

Download SIMMC 2.0 Dataset

Construct Incremental Layout Graphs

Generate Multimodal Question Answering Pairs

Prepare SIMMC 2.0 Dialog Data

🛫 Pre-training

🚀 Fine-tuning

📐 Evaluation

📝 License

✒ Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages