By Shuaicheng Li, Qianggang Cao, Lingbo Liu, Kunlin Yang, Shinan Liu, Jun Hou, Shuai Yi. This repository is an official implementation of the paper Group Activity Recognition with Clustered Spatial-TemporalTransformer
GroupFormer utilizes a tailor-modified Transformer to model individual and group representation for group activity recognition. Firstly, we develop a Group Representation Generator to generate an initial group representation by merging the individual context and scene-wide context. Multiple stacked Spatial-Temporal Transformers(STTR) are then deployed to augment and refine both the individual and group representation. It takes advantage of query-key mechanism to model spatial-temporal context jointly for group activity inferring.
This project is released under the Apache 2.0 license.
Backbone | Style | Action Acc | Activity Acc | Config | Download |
---|---|---|---|---|---|
Inv3+flow+pose | pytorch | 0.847 | 0.957 | config | model | test_log |
##Preparation
-
Linux, CUDA>=9.2, GCC>=5.4
-
Python>=3.7
First download the Volleyball dataset.
The following file need to be adapted in order to run the code on your own machine:
- Change the file path including
keypoint, dataset,tracks and flow
inconfig/*.yaml
.
We also provide the Keypoint data extracted from AlphaPose.
Flow data is too huge to upload, it can be easily generated by flownet as mentioned in our paper.
./dist_train.sh $GPU_NUM $CONFIG
./dist_test.sh $GPU_NUM $CONFIG $CHECKPOINT
If you find this work is useful in your research, please consider citing:
@inproceedings{li2021groupformer,
title={GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer},
author={Li, Shuaicheng and Cao, Qianggang and Liu, Lingbo and Yang, Kunlin and Liu, Shinan and Hou, Jun and Yi, Shuai},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={13668--13677},
year={2021}
}
A humble version has been released, containing core modules mentioned in this paper.
Any suggestion are welcome. We are glad to optimize our code and provide more details.