-
Notifications
You must be signed in to change notification settings - Fork 7
/
index.html
203 lines (165 loc) · 10.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
<!DOCTYPE HTML>
<html>
<head>
<title>American Sign Language Translator</title>
<meta name="description" content="website description" />
<meta name="keywords" content="website keywords, website keywords" />
<meta http-equiv="content-type" content="text/html; charset=windows-1252" />
<link rel="stylesheet" type="text/css" href="style/style.css" />
</head>
<body>
<div id="main">
<div id="header">
<div id="logo">
<div id="logo_text">
<!-- class="logo_colour", allows you to change the colour of the text -->
<h1><a href="index.html"><span class="logo_colour">American Sign Language Translator</span></a></h1>
<h2>ASL Fingerspeller</h2>
<p></br>Contributors: Abhinuv Pitale | Omkar Dhande | Pranavi Rambhakta | Varun Kumthekar</p>
</div>
</div>
</div>
<div id="content_header"></div>
<div id="site_content">
<p><div id="banner"></div></p>
<a name = "intro">
<div class = 'intro_container'>
<div id="content"></a>
<!-- insert the page content here -->
<h1>Introduction</h1>
<p>The only mode of offline communication for hearing-impaired persons is through sign language. Learning the sign language is convoluted and has many conventions. The aim of this project is to develop an American Sign Language translator in order to mitigate the aforementioned difficulties. Another aim of this project is to develop a hand gesture recognition system to establish human computer interaction.</p>
</div>
<div id = 'content_video'>
<!-- insert your sidebar items here -->
<iframe width="480" height="315"
src="https://www.youtube.com/embed/GhI2VwPll0k">
</iframe>
</div>
</div>
<a name = "dataset">
<div id="content1"></a>
<!-- insert the page content here -->
<h1>Datasets Used</h1>
<p>The code was first tested with the Triesch Dataset, after which we created our own dataset to suit our testing conditions (unideal) as well as expand our letter-base</p>
<h3>Triesch Dataset</h3> <p>Triesch dataset contains letters a,b,c and v against 3 different backgrounds as shown below.</p>
<img src="http://i65.tinypic.com/25heao6.jpg" border="0"></a>
<h3>User Dataset</h3><p>The created dataset contains images of letters a,b,c,d,g,i,l,v and y, recorded at different angles and at different distances with respect to the camera. <br/>This variation is introduced in order to introduce tolerances for orientation and scale.
<img src="http://i66.tinypic.com/2nu6kvk.jpg" border="0">
<h4>Data Augmentation</h4>
<p>Left-Right Hands -> We generate a set of rotated images from the existing images to help train our model to make it more robust.<br/>Tilt/Zoom -> Generated set of images with different levels of tilt and zoom to increase the variety of the training dataset.</p>
</div>
<div id="content1">
<!-- insert the page content here -->
<h1>Approach</h1>
<p><img src="http://i64.tinypic.com/bf2cdv.jpg" border="0" ></p>
<h3>Hand Segmentation</h3> <p>The Otsu algorithm segments the image into background and foreground. The input image is the grayscaled gaussain blurred image. This method classifies the pixels based on their grayscale pixel values and cluster them into 2 regions. We used this method for hand segmentation due to its invariance to dynamic light conditions </p>
<p><img src="http://i65.tinypic.com/2qvzcpf.jpg" border="0"></p>
<h3>Feature Extraction</h3><p>Various features extracted for an image include the hull points, contour area, convexity defect coordinates, HoG features and bounding box. We particularly selected the HoG features due to clear distinction obtained for individual characters.
<p>The <i><b>Contour</b></i> in the image contains a set of all connected points which form the outline of the hand region.</br>The <i><b>Hull</b></i> contains the area included by connecting the extremities (bulging points) of the contour in the detected hand region.</br>The <i><b>Convexity Defects</b></i> are convex regions formed by the area between the fingers in the contour of the hand region. These are called defects because the area between the fingers forms a convex curve in an otherwise concave hull.</br>The <i><b>Bounding Box</b></i> is the minimum rectangle large enough to crop the entire hand region. Computing the ratio of contour area with respect to bounding box area makes the detection invariant to scale.</p>
<img src="http://i68.tinypic.com/jb76gx.jpg" border="0">
<p><i><b>HoG Features</b></i> have been computed on the contours of each letter in the database. HOG features compute the occurrences of gradient orientations in portions of an image, so we can describe characteristic edges regarding the shape of the hands. The figure below shows the HoG features for each letter in the dataset.</p>
<!-- Trigger the Modal -->
<img id="myImg" src="http://i64.tinypic.com/vrskya.jpg" alt="HoG Features" width=45% height="300" >
<!-- The Modal -->
<div id="myModal" class="modal">
<!-- The Close Button -->
<span class="close">×</span>
<!-- Modal Content (The Image) -->
<img class="modal-content" id="img01">
<!-- Modal Caption (Image Text) -->
<div id="caption"></div>
</div>
<!-- Trigger the Modal -->
<img id="myImg1" src="http://i65.tinypic.com/2wnwe87.jpg" alt="HoG Features" width=45% height="300" >
<!-- The Modal -->
<div id="myModal" class="modal">
<!-- The Close Button -->
<span class="close">×</span>
<!-- Modal Content (The Image) -->
<img class="modal-content" id="img01">
<!-- Modal Caption (Image Text) -->
<div id="caption"></div>
</div>
</br><p>(Click to expand)</p>
</div>
<a name = "results">
<div id="content1"></a>
<!-- insert the page content here -->
<h1>Results</h1>
<p>The below video demonstrates how correct words are predicted from sequence of 4 letters. We provided a series of characters 'b','a','y','d' and it displayed "BAD" as the corrected output. </p>
<p>
1. HoG + SVM works really well on the Treisch dataset as it is nicely centered. But, when the model trained on this dataset is used in a live video, the predictions fail. </br>
2. HoG + SVM has slightly worse accuracy for in-set when trained using our dataset. But the model trained using this dataset works a lot better with live video.</br>
</p>
<table style="width:100%" >
<tr>
<th>Dataset</th>
<th>In-Set Accuracy</th>
<th>Live-Video Accuracy</th>
</tr>
<tr>
<td>Triesch</td>
<td>87%</td>
<td>40%</td>
</tr>
<tr>
<td>4-Letter Dataset</td>
<td>80%</td>
<td>65%</td>
</tr>
<td>9-Letter Dataset</td>
<td>78%</td>
<td>61%</td>
</tr>
</table>
<p>Initially we started training out models with only 4 different letters. However this was not enough as when the hands were changed(right -> left), the accuracy of the classifier fell.</br> We also included height, width and the ratio of the bounding box in training the classifier. These features were given more significance as they were better spread out as seen from the PCA plot.</p>
<img src="http://i67.tinypic.com/ir1i6v.jpg" border="0" width=60%>
<p>Another problem we faced was to find letter boundaries, this was done using edge filtering. Since edge filtering was used we were not able to form words where a certain letter is consecutively repeated.</br> To remove this drawback and also to include a word auto-correct feature, we implemented a word speller(using autocorrect library in python). The combination of letter edge filtering along with word prediction enabled us to create words using fingerspelling gestures. Additionally, it also increased our overall datasets spread as more letters could be incorporated into words due to the auto-correct feature.
</p>
<p>The below video demonstrates how correct words are predicted from sequence of 4 letters. We provided a series of characters 'b','a','y','d' and it displayed "BAD" as the corrected output.</p>
<iframe width="1280" height="720"
src="https://www.youtube.com/embed/nuG3TJhEyko">
</iframe>
</br>
<!-- insert your sidebar items here -->
<div id="content1"></a>
<!-- insert the page content here -->
<h1>References</h1>
<p><a href="https://www.hindawi.com/journals/tswj/2014/267872/">Hand Segmentation Paper</a></br><a href="http://www.idiap.ch/resource/gestures/">Marcel Database</a></br><a href="https://docs.opencv.org/3.3.1/d4/d73/tutorial_py_contours_begin.html">Feature Extraction Python</a></br><a href="http://ieeexplore.ieee.org/document/7162670/">Hand Gesture Detection using HoG</a></br><a href="https://filebox.ece.vt.edu/~F15ECE5554ECE4984/resources/final_webpage.zip">Website Template</a></p>
</div>
<div id="content1"></a>
<!-- insert the page content here -->
<h1>Conclusion</h1>
<p>In this project we have demonstrated an application of Hand-Gesture recognition in ASL fingerspelling by detecting 9 letters of the English alphabet. Future work will primarily comprise of expanding the dataset to include more letters.</p>
</div>
</div>
</div>
</div>
</br>
</br>
<div id="content_footer"></div>
<div id="footer">
<p>Contributors: Abhinuv Pitale | Omkar Dhande | Pranavi Rambhakta | Varun Kumthekar</p>
</div>
</div>
<script>
// Get the modal
var modal = document.getElementById('myModal');
// Get the image and insert it inside the modal - use its "alt" text as a caption
var img = document.getElementById('myImg');
var modalImg = document.getElementById("img01");
var captionText = document.getElementById("caption");
img.onclick = function(){
modal.style.display = "block";
modalImg.src = this.src;
captionText.innerHTML = this.alt;
}
// Get the <span> element that closes the modal
var span = document.getElementsByClassName("close")[0];
// When the user clicks on <span> (x), close the modal
span.onclick = function() {
modal.style.display = "none";
}
</script>
</body>
</html>