Skip to content

Depth-based hand detection, pose estimation and gesture recognition

License

Notifications You must be signed in to change notification settings

LadaOndris/hand-recognition

Repository files navigation

Gesture recognition system

Author: Ladislav Ondris

This project performs gesture recognition from depth images. It consists of hand detection, hand pose estimation, and gesture classification. Hands are detected using a Tiny YOLOv3 model. The gesture recognition system then uses a JGR-P2O hand pose estimator to determine the hands' skeleton, which is used for gesture classification.

See demonstration videos, which are located in the docs/ directory.

Prerequisites

Python 3.7.10
Intel RealSense SR305 depth camera - for live capture

Installation

Install the required packages with:

pip install -r requirements.txt

In case TensorFlow has a wrong dependency of gast, which may result in warning or error messages, install 0.3.3 version of gast, which downgrades the package from version 0.4.0.

pip install gast==0.3.3

Usage examples

The following examples use mostly 'live' option as the source of images. You can use the 'dataset' option instead. Although the custom dataset is not part of the repository, as its size too big, a few images were included for demonstration purposes.

Hand detection

To detect both hands from images captured with SR305 camera (as the default option):

python3 detect.py live --num-detections 2 --plot

live_detection live_detection2

usage: detect.py [-h] [--camera CAMERA] [--num-detections NUM_DETECTIONS]
                 [--plot]
                 source

positional arguments:
  source                the source of images (allowed options: live, dataset)

optional arguments:
  -h, --help            show this help message and exit
  --camera CAMERA       the camera model in use for live capture (default:
                        SR305)
  --num-detections NUM_DETECTIONS
                        the maximum number of bounding boxes for hand
                        detection (default: 1)
  --plot                plot the result of detection

Hand pose estimation

To estimate hand poses from images captured with SR305 camera:

python3 estimate.py live --plot

live_estimation live_estimation2

usage: estimate.py [-h] [--camera CAMERA] [--plot] source

positional arguments:
  source           the source of images (allowed options: live, dataset)

optional arguments:
  -h, --help       show this help message and exit
  --camera CAMERA  the camera model in use for live capture (default: SR305)
  --plot           plot the result of estimation

System's usage

The system requires that the user defines the gesture to be recognized, which is described in Section Preparation of gesture database. For demonstration purposes, the gesture database is already prepared for the gesture with an opened palm, fingers outstretched and apart.

The usage of the real-time recognition from live images or from the custom dataset is shown in Real-time gesture recognition.

Real-time gesture recognition

For demonstration, the directory named "gestures" is already present, containing definitions for a gesture with an opened palm, fingers outstretched and apart.

To start the gesture recognition system using gesture database stored in the gestures directory:

python3 recognize.py live gestures --plot

To start the gesture recognition from the evaluation dataset:

python3 recognize.py dataset gestures --plot

The system plots figures similar to the following:

live_gesture1 live_nongesture

usage: recognize.py [-h] [--error-threshold ERROR_THRESHOLD]
                    [--orientation-threshold ORIENTATION_THRESHOLD]
                    [--camera CAMERA] [--plot] [--hide-feedback]
                    [--hide-orientation]
                    source directory

positional arguments:
  source                the source of images (allowed options: live, dataset)
  directory             the name of the directory containg the user-captured
                        gesture database

optional arguments:
  -h, --help            show this help message and exit
  --error-threshold ERROR_THRESHOLD
                        the pose (JRE) threshold (default: 120)
  --orientation-threshold ORIENTATION_THRESHOLD
                        the orientation threshold in angles (maximum: 90,
                        default: 90)
  --camera CAMERA       the camera model in use for live capture (default:
                        SR305)
  --plot                plot the result of gesture recognition
  --hide-feedback       hide the colorbar with JRE errors
  --hide-orientation    hide the vector depicting the hand's orientation

Preparation of gesture database

Beware: the preparation of gesture database requires a depth camera. You can skip this section because there is already a database called gestures available.

To capture a gesture with label 1 into a gestures directory with a scan period of one second and SR305 camera:

python3 database.py gestures 1 10
usage: database.py [-h] [--scan-period SCAN_PERIOD] [--camera CAMERA]
                   [--hide-plot]
                   directory label count

positional arguments:
  directory             the name of the directory that should contain the
                        user-captured gesture database
  label                 the label of the gesture that is to be captured
  count                 the number of samples to scan

optional arguments:
  -h, --help            show this help message and exit
  --scan-period SCAN_PERIOD
                        intervals between each capture in seconds (default: 1)
  --camera CAMERA       the camera model in use for live capture (default:
                        SR305)
  --hide-plot           hide plots of the captured poses - not recommended

Project structure

Top-level structure

.
├── datasets                # Datasets (including gesture database)
├── docs                    # Demonstration videos, readme files, and images 
├── logs                    # Saved models' weights
├── text_src                # Latex source files of the thesis' text
├── src                     # Source files
├── LICENSE                 # MIT license
├── README.md               # Contents of this file
├── requirements.txt        # Package requirements 
└── bachelors_thesis.pdf    # Text of the thesis

Datasets

datasets
├── bighand                     # Hand pose estimation dataset (not preprocessed)
├── cvpr15_MSRAHandGestureDB    # Hand pose estimatino dataset (is preprocessed)
├── handseg150k                 # Hand segmentation dataset (both hands)
├── simple_boxes                # Generated toy object detection dataset
├── custom                      # Created dataset for the evaluation of gesture recognition
└── usecase                     # Contains gesture databases captured by the user 

Source files

src
├── acceptance               # Gesture acceptance module (gesture recognition algorithm)
├── datasets                 # Dataset related code (pipelines, plots, generation)
├── detection                # Detection methods - Tiny YOLOv3, RDF
├── estimation               # JGR-P2O estimation model and preprocessing
├── metacentrum              # Scripts for training models in Metacentrum
├── system                   # Access point to gesture recognition system 
│                              (database_scanner, gesture_recognizer, hand_position_estimator)
└── utils                    # Camera, logs, plots, live capture, config

License

This project is licensed under the terms of the MIT license.