Skip to content

zhengsipeng/VRDFormer_VRD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VRDFormer: End-to-end Video Relation Detection with Transformers

This repository provides the implementation of the VRDFormer: End-to-end Video Relation Detection with Transformers paper. The codebase builds upon DETR, Deformable DETR and TrackFormer.

Abstract

Visual relation understanding plays an essential role for holistic video understanding. Most previous works adopt a multi-stage framework for video visual relation detection (VidVRD), which cannot capture long-term spatiotemporal contexts in different stages and also suffers from inefficiency. In this paper, we propose a transformerbased framework called VRDFormer to unify these decoupling stages. Our model exploits a query-based approach to autoregressively generate relation instances. We specifically design static queries and recurrent queries to enable efficient object pair tracking with spatio-temporal contexts. The model is jointly trained with object pair detection and relation classification. Extensive experiments on two benchmark datasets, ImageNet-VidVRD and VidOR, demonstrate the effectiveness of the proposed VRDFormer, which achieves the state-of-the-art performance on both relation detection and relation tagging tasks.

DATA PREPARATION

We refer to our docs/DATA.md for detailed installation instructions.

Installation

We refer to our docs/INSTALL.md for detailed installation instructions.

Train VRDFormer on VidVRD

  • Train VidVRD based on detr with 8GPUs and batchsize=32
sh script/stage1/train_mgpu.sh
sh script/stage2/train_mgpu.sh
  • Train VidVRD based on deformable detr
sh script/stage1/train_deform_mgpu.sh
sh script/stage2/train_deform_mgpu.sh

Train VRDFormer on VidVRD

Train VidOR based on detr with 8GPUs and batchsize=64

sh script/stage1/train_mgpu_vidor.sh
sh script/stage2/train_mgpu_vidor.sh

Train VidOR based on deformable detr

sh script/stage1/train_deform_mgpu_vidor.sh
sh script/stage2/train_deform_mgpu_vidor.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages