Project Page | Videos | Paper
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding
Dian Shao1,
Yue Zhao1,
Bo Dai1,
Dahua Lin1
1 MMLab, The Chinese University of Hong Kong
- 07/2020: We release pre-extracted features of FineGym dataset from 20 models for you to choose freely! See the following for more details!
- 05/2020: Talks and Demo about FineGym can all be found here: FineGym Youtube Playlist.
- 04/2020: Annotations of FineGym Dataset are released! Please refer to FineGym Homepage for more details!
- 03/2020: The Paper FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding is accepted by CVPR 2020 as an Oral Paper, and we got Three Strong Accepts!
For Gym99 (for details of the subset gym99 & gym288 please refer to the FineGym Homepage) :
Model | Backbone | Pre-trained | Fine-tuned | Mean Class ACC | Top-1 ACC | Train-features | Val-Features | Feature-size per inst. |
---|---|---|---|---|---|---|---|---|
BN-Inception | BN-Inception | ImageNet | - | - | - | Gym99-train-bninception | Gym99-val-bninception | 12 x 1024 x 1 x 1 |
ResNet50 | ResNet50 | ImageNet | - | - | - | Gym99-train-r50 | Gym99-val-r50 | 12 x 2048 x 1 x 1 |
TSN | BN-Inception | ImageNet | Gym99 | 61.4 | 74.8 | Gym99-train-tsn | Gym99-val-tsn | 12 x 1024 x 1 x 1 |
I3D | ResNet50 | ImageNet | Gym99 | 63.2 | 74.8 | Gym99-train-i3d-imnet | Gym99-val-i3d-imnet | 12 x 2048 x 1 x 1 x 1 |
I3D | ResNet50 | Kinetics | Gym99 | 64.4 | 75.6 | Gym99-train-i3d-kin | Gym99-val-i3d-kin | 12 x 2048 x 1 x 1 x 1 |
For Gym288:
Model | Backbone | Pre-trained | Fine-tuned | Mean Class ACC | Top-1 ACC | Train-features | Val-Features | Feature-size per inst. |
---|---|---|---|---|---|---|---|---|
BN-Inception | BN-Inception | ImageNet | - | - | - | Gym288-train-bninception | Gym288-val-bninception | 12 x 1024 x 1 x 1 |
ResNet50 | ResNet50 | ImageNet | - | - | - | Gym288-train-r50 | Gym288-val-r50 | 12 x 2048 x 1 x 1 |
TSN | BN-Inception | ImageNet | Gym288 | 26.5 | 68.3 | Gym288-train-tsn | Gym288-val-tsn | 12 x 1024 x 1 x 1 |
I3D | ResNet50 | ImageNet | Gym288 | 27.9 | 66.7 | Gym288-train-i3d-imnet | Gym288-val-i3d-imnet | 12 x 2048 x 1 x 1 x 1 |
I3D | ResNet50 | Kinetics | Gym288 | 28.2 | 66.1 | Gym288-train-i3d-kin | Gym288-val-i3d-kin | 12 x 2048 x 1 x 1 x 1 |
- All feature files are in 'pickle' format, whose types are Python Dictionaries.
- The keys of each Feature Dictionary are element id in:
- E.g. key: A0xAXXysHUo_E_002184_002237_A_0035_0036; The corresponding values are extracted features for that action instance.
- The number of element-level action instances:
- for Gym99 (v1.0) are 20484/8521 for Train/Val set respectively;
- for Gym288 (v1.0) are 22671/9646 for Train/val set respectively.
- To keep the size of each feature file relatively small (mostly 1~2 G per file), the spatial information is pooled, resulting in spatial shape: 1 x 1.
- Details of model training and results please refer to our paper and supplementary material.
- Note that for I3D models, the extracted features are dense (i.e. seg=12, 8 frames per seg), since many element-level instances last only for 1~2 seconds.
@inproceedings{shao2020finegym,
title={FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding},
author={Shao, Dian and Zhao, Yue and Dai, Bo and Lin, Dahua},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2020}
}