Text classification workflow #1025

gheinrich · 2016-08-31T16:01:00Z

~~Depends on #927 (support for plug-ins) and #1024 (modularization of inference).~~

IsaacYangSLA · 2016-08-31T22:39:34Z

@gheinrich , in encode_entry(), if one char is encoded as k, and another char is encoded as k+1, does that imply these two char's are close to each other? I mean encoding characters into scalar, rather than one-hot-encoding, seems not common. For text classification, word2vec seems a more popular to encode document into vector space.

gheinrich · 2016-09-01T09:35:58Z

Hi @IsaacYangSLA, in the example network that I provide, the first layer is doing one-hot encoding of the characters. I chose to do the one-hot encoding in the network rather than in the dataset because that results in a much more compact dataset, especially if you have a large alphabet.

It's just my opinion, but I think "word2vec" kind of defeats the purpose of deep learning: you need logic outside of the network, like stemming algorithms, to identify words. I suppose the popularity of word2vec comes down to the limited memory/compute capabilities. I think a character-level representation of the data should be ultimately more powerful, similar to how Deep Neural Nets outperform HOG+SVM in image classification.

IsaacYangSLA · 2016-09-02T17:03:10Z

Hi @gheinrich , thanks for the information of first layer. That's a better design, I agree.

For the word2vec or character part, it seems more people use word2vec in NLP applications and the idea behind it is also reasonable, i.e. the concept of DC - USA + France ~= Paris. However, I see increasing researches are now on character-based text processing. Maybe in a few years, it will outperform word2vec in NLP applications.

gheinrich · 2016-10-05T13:30:50Z

rebased on tip of master branch

lukeyeager · 2016-10-20T16:14:29Z

plugins/data/textClassification/setup.py

+    author="Greg Heinrich",
+    description=("A data ingestion plugin for text classification"),
+    long_description=read('README'),
+    license="Apache",


Why Apache and not BSD-3?

I didn't really think of it - good point, I'll use the same license as the top-level setup.py

done on latest commit

lukeyeager · 2016-10-20T16:15:53Z

plugins/view/textClassification/digitsViewPluginTextClassification/view.py

+        scores = output_data[output_data.keys()[0]].astype('float32')
+
+        if self.terminal_layer_type == "logsoftmax":
+            scores = np.exp(scores)


Could you just check to see if the values sum to 1 instead of having this form field?

good idea (I guess I can also check if values are positive or negative)!

done on latest commit

lukeyeager

I ran through the text classification example using these plugins and it worked great. I'd like to see the logsoftmax thing removed before merge since it will make this a bit easier to use.

gheinrich · 2016-10-25T09:37:15Z

I have updated the text classification example to show how to use the plug-ins

lukeyeager · 2016-10-25T16:26:10Z

plugins/view/textClassification/digitsViewPluginTextClassification/view.py

+
+        if np.max(scores) < 0:
+            # terminal layer is a logsoftmax
+            scores = np.exp(scores)


lukeyeager · 2016-10-25T16:26:35Z

examples/text-classification/README.md

+```sh
+$ pip install $DIGITS_ROOT/plugins/data/textClassification
+$ pip install $DIGITS_ROOT/plugins/view/textClassification
+```


…-workflow Text classification workflow

gheinrich force-pushed the dev/text-classification-workflow branch from dd7153f to edd781d Compare August 31, 2016 16:01

gheinrich added the enhancement label Aug 31, 2016

gheinrich mentioned this pull request Sep 20, 2016

Pills Called Images #1098

Closed

gheinrich force-pushed the dev/text-classification-workflow branch from edd781d to 74039af Compare October 5, 2016 13:00

lukeyeager reviewed Oct 20, 2016

View reviewed changes

lukeyeager self-assigned this Oct 20, 2016

lukeyeager suggested changes Oct 20, 2016

View reviewed changes

lukeyeager removed their assignment Oct 20, 2016

lukeyeager mentioned this pull request Oct 21, 2016

Fix python lint under plugins/ #1198

Merged

gheinrich force-pushed the dev/text-classification-workflow branch from 74039af to f8447b0 Compare October 25, 2016 09:14

gheinrich added 3 commits October 25, 2016 11:15

Text classification data plugin

ab33fb2

Text classification visualization plugin

03186cd

Use plug-ins in text classification example

e7745fa

gheinrich force-pushed the dev/text-classification-workflow branch from f8447b0 to e7745fa Compare October 25, 2016 09:32

lukeyeager reviewed Oct 25, 2016

View reviewed changes

lukeyeager approved these changes Oct 25, 2016

View reviewed changes

lukeyeager merged commit 03ed004 into NVIDIA:master Oct 25, 2016

gheinrich deleted the dev/text-classification-workflow branch November 30, 2016 16:49

ethantang95 pushed a commit to ethantang95/DIGITS that referenced this pull request Jul 10, 2017

Merge pull request NVIDIA#1025 from gheinrich/dev/text-classification…

51738ca

…-workflow Text classification workflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text classification workflow #1025

Text classification workflow #1025

gheinrich commented Aug 31, 2016 •

edited

Loading

IsaacYangSLA commented Aug 31, 2016

gheinrich commented Sep 1, 2016

IsaacYangSLA commented Sep 2, 2016

gheinrich commented Oct 5, 2016

lukeyeager Oct 20, 2016

gheinrich Oct 20, 2016

gheinrich Oct 25, 2016

lukeyeager Oct 20, 2016

gheinrich Oct 20, 2016

gheinrich Oct 25, 2016

lukeyeager left a comment

gheinrich commented Oct 25, 2016

lukeyeager Oct 25, 2016

lukeyeager Oct 25, 2016

Text classification workflow #1025

Text classification workflow #1025

Conversation

gheinrich commented Aug 31, 2016 • edited Loading

IsaacYangSLA commented Aug 31, 2016

gheinrich commented Sep 1, 2016

IsaacYangSLA commented Sep 2, 2016

gheinrich commented Oct 5, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukeyeager left a comment

Choose a reason for hiding this comment

gheinrich commented Oct 25, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gheinrich commented Aug 31, 2016 •

edited

Loading