This is the official PyTorch implementation of the publication:
A. D’Eusanio, A. Simoni, S. Pini, G. Borghi, R. Vezzani, R. Cucchiara
A Transformer-Based Network for Dynamic Hand Gesture Recognition
In International Conference on 3D Vision (3DV) 2020
[Paper] [Project Page]
Transformer-based neural networks represent a successful self-attention mechanism that achieves outstanding results in
different topics, such as language understanding and sequence modeling.
The application of such models to different types of data, like the visual one, is necessary to push the boundaries of
common convolutinal and recurrent neural networks.
Therefore, in this work we propose a transformer-based architecture for the dynamic hand gesture recognition task,
focusing on the automotive environment.
Moreover, we propose the combined use of depth maps and surface normals as unique sources to successfully solve the
task, even in low-light conditions.
The two datasets we used are NVGestures and Briareo. Both of them contain data from multiple sensors: RGB, IR, and depth, allowing the study of multimodal fusion techniques.
In this work, we focused on the sole use of the depth sensor, which provides light-invariant depth maps that can be
further processed to obtain an estimation of the surface normals.
Experimental results show that the use of such a simple processing step leads to a significant gain in accuracy.
These instructions will give you a copy of the project up and running on your local machine for development and testing purposes. There isn't much to do, just install the prerequisites and download all the files.
Things you need to install to run the code:
Python >= 3.6.7
PyTorch >= 1.6
Install CUDA and PyTorch following the main website directive.
Run the command:
pip install requirements.txt
The employed datasets are publicy available:
Once downloaded, unzip anywhere in your drive.
Pytorch pretrained models are available at this link.
For this project we used a json file, located in the hyperparameters folder, such as:
hyperparameters/Briareo/[train.json]( "Briareo")
In there, you can set several parameters, like:
- Dataset, Briareo or NVGestures.
- phase, select if training or testing.
- Data-type, select which source is used: depth, rgb, ir, surface normals or optical-flow.
- Data-Nframe, length of the input sequence, default: 40 frame.
- Data-path, path where you downloaded and unzipped the dataset.
For every other information check the file.
python --hypes hyperparameters/NVGestures/train.json
, path to configuration file.
- Andrea D'Eusanio - Deusy94
- Alessandro Simoni - alexj94
- Stefano Pini - stefanopini
- Guido Borghi - gdubrg
- Roberto Vezzani - robervez
- Rita Cucchiara - Rita Cucchiara
If you use this code, please cite our paper:
title={A Transformer-Based Network for Dynamic Hand Gesture Recognition},
author={D'Eusanio, Andrea and Simoni, Alessandro and Pini, Stefano and Borghi, Guido and Vezzani, Roberto and Cucchiara, Rita},
booktitle={International Conference on 3D Vision},
This project is licensed under the MIT License - see the LICENSE file for details