This repository contains the "WLASL Recognition and Translation", employing the WLASL
dataset descriped in "Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison" by Dongxu Li.
The project uses Cuda and pytorch, hence a system with NVIDIA graphics is required. Also, to run the system a minimum of 4-5 Gb of dedicated GPU Memory is needed.
The dataset used in this project is the "WLASL" dataset and it can be found here on Kaggle
Download the dataset and place it in data/ (in the same path as WLASL directory)
To run the project follow the steps
- Clone the repo
git clone https://github.com/alanjeremiah/WLASL-Recognition-and-Translation.git
- Install the packages mentioned in the requirements.txt file
Note: Need to install the correct compatible version of the cudatoolkit with pytorch. The compatible version with the command line can be found here. Below is the CLI used in this project
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
-
Open the WLASL/I3D folder and unzip the NLP folder in that path
-
Open the run.py file to run the application
python run.py
This repo uses the I3D model. To train the model, view the original "WLASL" repo here
The NLP models used in this project are the KeyToText
and the NGram
model.
The KeyToText was built over T5 model by Gagan, the repo can be found here
The end results of the project looks like this.
The conversion of Sign language
to Spoken Language.