Imagine revolutionizing TV control with gestures! I, as a data science team at a leading home electronics company, embarked on a journey to develop a feature for smart TVs that recognizes five different gestures. Users can control the TV seamlessly without a remote.
Each gesture corresponds to a specific command:
- Thumbs up: Increase the volume
- Thumbs down: Decrease the volume
- Left swipe: 'Jump' backward 10 seconds
- Right swipe: 'Jump' forward 10 seconds
- Stop: Pause the movie
Our challenge? Continuous gesture monitoring through the TV's webcam and processing these gestures effectively.
To tackle this problem, we adopted a Conv3D CNN-RNN architecture stack, leveraging the power of Convolutional 3D layers and LSTM (Long Short-Term Memory) networks. This architecture excels at capturing both spatial and temporal information from the video sequences.
The dataset consists of video sequences, each containing 30 frames/images. These videos were recorded using regular webcams, simulating real interactions with smart TVs. Each gesture's frames are categorized into five classes (0-4), corresponding to the five gestures.
To enhance model generalization, we employed an ImageDataGenerator for data augmentation, effectively increasing the dataset's diversity and robustness.
data/
: Contains the dataset (not included in this repository due to size).notebooks/
: Jupyter notebooks for data exploration, model development, and evaluation.models/
: Saved model checkpoints.README.md
: You're reading it!
- Download the dataset from here : https://drive.google.com/uc?id=1ehyrYBQ5rbQQe6yL4XbLWe3FMvuVUGiL
- Set up your Python environment with the necessary libraries.
- Explore the provided Jupyter notebooks to understand the project.
- Train the model and save checkpoints as needed.
Contributions are welcome! Feel free to open issues, provide feedback, or submit pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.