VRDFormer: End-to-end Video Relation Detection with Transformers

This repository provides the implementation of the VRDFormer: End-to-end Video Relation Detection with Transformers paper. The codebase builds upon DETR, Deformable DETR and TrackFormer.

Abstract

Visual relation understanding plays an essential role for holistic video understanding. Most previous works adopt a multi-stage framework for video visual relation detection (VidVRD), which cannot capture long-term spatiotemporal contexts in different stages and also suffers from inefficiency. In this paper, we propose a transformerbased framework called VRDFormer to unify these decoupling stages. Our model exploits a query-based approach to autoregressively generate relation instances. We specifically design static queries and recurrent queries to enable efficient object pair tracking with spatio-temporal contexts. The model is jointly trained with object pair detection and relation classification. Extensive experiments on two benchmark datasets, ImageNet-VidVRD and VidOR, demonstrate the effectiveness of the proposed VRDFormer, which achieves the state-of-the-art performance on both relation detection and relation tagging tasks.

DATA PREPARATION

We refer to our docs/DATA.md for detailed installation instructions.

Installation

We refer to our docs/INSTALL.md for detailed installation instructions.

Train VRDFormer on VidVRD

Train VidVRD based on detr with 8GPUs and batchsize=32

sh script/stage1/train_mgpu.sh
sh script/stage2/train_mgpu.sh

Train VidVRD based on deformable detr

sh script/stage1/train_deform_mgpu.sh
sh script/stage2/train_deform_mgpu.sh

Train VRDFormer on VidVRD

Train VidOR based on detr with 8GPUs and batchsize=64

sh script/stage1/train_mgpu_vidor.sh
sh script/stage2/train_mgpu_vidor.sh

Train VidOR based on deformable detr

sh script/stage1/train_deform_mgpu_vidor.sh
sh script/stage2/train_deform_mgpu_vidor.sh

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
configs		configs
data		data
datasets		datasets
docs		docs
models		models
scripts		scripts
util		util
README.md		README.md
engine.py		engine.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VRDFormer: End-to-end Video Relation Detection with Transformers

Abstract

DATA PREPARATION

Installation

Train VRDFormer on VidVRD

Train VRDFormer on VidVRD

About

Releases

Packages

Languages

zhengsipeng/VRDFormer_VRD

Folders and files

Latest commit

History

Repository files navigation

VRDFormer: End-to-end Video Relation Detection with Transformers

Abstract

DATA PREPARATION

Installation

Train VRDFormer on VidVRD

Train VRDFormer on VidVRD

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages