This Repo makes use of an Encoder-Decoder Network to Segment Hand in the given Images
This model was trained on images that had hand in the 'Fist Closed' Gesture, and the background was almost similar in all the training examples.
100 such images were augmented using rotation and translation to expand the training set to 12500. Resorted to this since i had to manually go and segment images in the training set which is time consuming. So in order to save time had to do augmentation.
- To Train the Model on other Class and Background to get better results
Though model was trained on only 'Fist closed' gesture, the model was able to perform very well for the images that were somewhere close to 'Fist closed'.
Although there seems to be some errors since some of the background is also coloured in a few output of the decoder, but relatively the performance is better than what i expected in unseen gestures.
The model's performance on images with open fist and spreaded out fingers was very terrible. Though the performance can be improved by training on such images.
The output of the decoder can be used as an input to CNN. Since the output of the decoder will have uniform colour for hands it could be easier for the CNN to achieve higher accuracy.
Instead of segmenting hand area out, we could train a Convolutional autoencoder to perform background deletion. The output of such convolutional autoencoder would be definitely more useful for a CNN in classification as it removes a lot of noise from the image, thereby allowing the model to learn without any distractions.