Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding

European Conference on Computer Vision (ECCV), 2022

Introduction

Embodied Reference Understanding studies the reference understanding in an embodied fashion, where a receiver requires to locate a target object referred to by both language and gesture of the sender in a shared physical environment. Its main challenge lies in how to make the receiver with the egocentric view access spatial and visual information relative to the sender to judge how objects are oriented around and seen from the sender, i.e., spatial and visual perspective-taking. In this paper, we propose a REasoning from your Perspective (REP) method to tackle the challenge by modeling relations between the receiver and the sender as well as the sender and the objects via the proposed novel view rotation and relation reasoning. Specifically, view rotation first rotates the receiver to the position of the sender by constructing an embodied 3D coordinate system with the position of the sender as the origin. Then, it changes the orientation of the receiver to the orientation of the sender by encoding the body orientation and gesture of the sender. Relation reasoning models both the nonverbal and verbal relations between the sender and the objects by multi-modal cooperative reasoning in gesture, language, visual content, and spatial position.

Framework

Dataset

Download the YouRefIt dataset from Dataset Request Page and put under ./ln_data

Model weights

Yolov3: download the pretrained model and place the file in ./saved_models by
```
sh saved_models/yolov3_weights.sh
```

Make sure to put the files in the following structure:

|-- ROOT
|	|-- ln_data
|		|-- yourefit
|			|-- images
|			|-- paf
|			|-- saliency

Training and Evaluation

The training and evaluation script is the same as YouRefIt

Checklist

code
pre-process data

Citation

@inproceedings{shi2022spatial,
  title={Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding},
  author={Shi, Cheng and Yang, Sibei},
  booktitle={Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXXVI},
  pages={201--218},
  year={2022},
  organization={Springer}
}

Acknowledgement

Our code is built on ReSC and YouRefIt, we thank the authors for their hard work.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
dataset		dataset
doc		doc
ln_data		ln_data
model		model
saved_models		saved_models
utils		utils
README.md		README.md
evaluation_results.py		evaluation_results.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding

Introduction

Framework

Dataset

Model weights

Training and Evaluation

Checklist

Citation

Acknowledgement

About

Releases

Packages

Languages

SooLab/REP-ERU

Folders and files

Latest commit

History

Repository files navigation

Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding

Introduction

Framework

Dataset

Model weights

Training and Evaluation

Checklist

Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages