The core objective of this project is to provide accurate and fast classification of hand signs based on real-time input. It is implemented using several well-known libraries:
-
MediaPipe Hands Model: This model detects and tracks hand landmarks in real-time, offering high accuracy in hand position recognition. It provides 21 key points (landmarks) for each hand and tracks multiple hands in a video frame, which are used for further sign classification.
-
PyTorch Classification Model: Initially, a custom CNN was developed and analyzed for accuracy optimization. Following this, ResNet18 was tested but yielded lower accuracy. Consequently, a pre-trained MobileNetV2 architecture was adopted, which provides superior accuracy in classifying hand signs. This model was fine-tuned using a dataset of hand sign images, with its weights saved in
asl_crop_v4_1_mobilenet_weights
. The model outputs probabilities for different hand sign classes, applying a confidence threshold to ensure accurate predictions.
-
Hand Tracking and Detection: The MediaPipe Hands model is responsible for detecting the user's hand in the video feed and extracting key hand landmarks. These landmarks are essential for determining hand orientation and positioning, which feeds into the classification process.
-
Sign Classification: Once the hand is detected and the landmarks are identified, the processed landmarks are passed into the PyTorch classification model. The classifier predicts the hand sign based on the detected landmarks. The model has been trained on a variety of hand gestures to recognize different signs accurately.
-
Real-time Video Processing: The application continuously processes webcam frames, applying the hand tracking model, running sign classification, and displaying the results in a GUI. A confidence threshold of 0.7 is applied, meaning that only predictions with high certainty are shown to the user. Additionally, predictions are averaged over the last 10 frames to smooth out any jitter or instability in the real-time predictions.
-
Custom Hand Landmarks Display: The hand landmarks are visually represented in the output video stream. Each part of the hand (fingers, palm, etc.) is color-coded for better clarity. This visualization helps the user see the points being tracked and how they correspond to the predicted hand sign.
- Webcam Input: The app captures live video input from the webcam.
- Hand Detection: The MediaPipe model processes each frame, detects the hands, and extracts landmarks.
- Hand Sign Classification: The detected landmarks are passed to the PyTorch model, which predicts the hand sign.
- Display: The video feed is displayed through a graphical interface, with hand landmarks and classification results overlaid.
- MediaPipe Hands provides robust hand detection and tracking capabilities by identifying 21 landmarks on each hand. It works well under different lighting conditions and can detect multiple hands in a single frame.
- This model ensures that only the precise hand region is analyzed, which is essential for the classification step.
- The classification model is based on the MobileNet architecture, which has been fine-tuned for the task of hand sign recognition. MobileNet is a lightweight model designed for mobile and embedded vision tasks, making it an efficient choice for real-time applications.
- The classifier takes hand landmarks as input and predicts the hand sign from a predefined set of classes. The model used in this project was trained for 10 epochs, and its weights are stored in the file
asl_crop_v4_1_mobilenet_weights.pth
.
-
Clone the repository:
git clone https://github.com/talfig/sign-language-recognition.git cd sign-language-recognition
-
Install dependencies:
Ensure Python 3.x is installed. Then, install the required libraries by running:
pip install -r requirements.txt
-
Run the Application:
To launch the app and start the webcam-based hand sign detection:
python app/frame.py
-
Visit the official NVIDIA CUDA Toolkit website: CUDA Toolkit.
-
Select Windows as your operating system and download the appropriate version (CUDA 11.2 is suggested for compatibility with TensorFlow and PyTorch).
-
Run the downloaded installer and follow the installation instructions. Choose Express Install or Custom Install depending on your preference.
-
After installation, verify the CUDA installation by running the following command in Command Prompt:
nvcc --version
-
Visit the official NVIDIA CUDA Toolkit website: CUDA Toolkit.
-
Select Linux as your operating system and download the appropriate version.
-
Follow these commands to install CUDA:
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub sudo sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list' sudo apt-get update sudo apt-get -y install cuda
-
Verify the installation with:
nvcc --version
- Visit the NVIDIA cuDNN library page: cuDNN Download.
- Download the cuDNN version compatible with your CUDA installation.
- Unzip the cuDNN package and copy the files into the appropriate CUDA directories (usually located in
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2
):- Copy the contents of the
bin
folder toC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin
. - Copy the contents of the
include
folder toC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\include
. - Copy the contents of the
lib
folder toC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\lib\x64
.
- Copy the contents of the
-
Visit the NVIDIA cuDNN library page: cuDNN Download.
-
Download the cuDNN version compatible with your CUDA installation.
-
Install cuDNN by running:
tar -xzvf cudnn-linux-x86_64-8.x.x.x_cuda11.2-archive.tar.xz sudo cp cuda/include/cudnn*.h /usr/local/cuda/include sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
- Open Control Panel > System and Security > System.
- Click Advanced system settings on the left, then click Environment Variables.
- Under System variables, find
Path
, select it, and click Edit. - Add the following to the list of paths (adjust the CUDA version if needed):
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\libnvvp
- Click OK to save the changes.
-
Open your
.bashrc
or.zshrc
file (depending on your shell):nano ~/.bashrc
-
Add the following lines at the end of the file:
export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
-
Save the file and run:
source ~/.bashrc
If you're encountering an error indicating that your PyTorch installation does not support CUDA, follow these steps to resolve the issue:
Ensure that you have installed a version of PyTorch that supports CUDA. Run the following commands in your Python environment:
import torch
print(torch.__version__)
print(torch.cuda.is_available())
If CUDA is not available, you may need to reinstall PyTorch with the correct CUDA version. Use the following command, ensuring it matches your installed CUDA version:
For example, if you have CUDA 11.2 installed:
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu112
You can find the correct command on the official PyTorch installation page.
Make sure your CUDA installation is correctly set up and that your environment variables point to the correct directories. Check that CUDA_PATH and CUDA_PATH_V11_2 are set correctly.
Ensure that your NVIDIA driver is compatible with the version of CUDA you are using. You can check the installed driver version using:
nvidia-smi
After reinstalling PyTorch with CUDA support, run the following command again to check if CUDA is now available:
import torch
print(torch.cuda.is_available())
This project is licensed under the GNU Affero General Public License v3.0 (AGPL) - see the LICENSE file for details.