Issue 150: Long Short-Term Memory (LSTM) Binary Classifier for texts #153

Hector-hedb12 · 2019-04-04T19:14:54Z

Resolves #150

csala · 2019-04-14T11:16:03Z

mlprimitives/jsons/keras.Sequential.LSTMBinaryTextClassifier.json

+                    0.75
+                ]
+            },
+            "batch_size": {


batch_size is currently ignored by the keras adapter, so this line can be removed.

csala · 2019-04-14T11:31:16Z

mlprimitives/jsons/keras.Sequential.LSTMBinaryTextClassifier.json

+    },
+    "hyperparameters": {
+        "fixed": {
+            "classification": {


A fixed hyperparameter called "verbose" with default false should be added. See #143 .

@csala according to the doc, verbose should be an integer:

verbose: Integer. 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch.

So, a bool might not work, right? 🤔

Well, it actually does work. I guess True/False is internally interpreted as 1/0. Feel free to change it to int type.

csala · 2019-04-15T13:08:54Z

mlprimitives/jsons/keras.Sequential.LSTMBinaryTextClassifier.json

@@ -67,6 +67,14 @@
                "type": "int",
                "default": 20
            },
+            "verbose": {
+                "type": "int",
+                "default": 1,


Please set the default to 0

csala · 2019-04-22T10:42:55Z

@Hector-hedb12 The primitive looks good and can be merged, but I'm afraid the pipeline does not.

According to the Keras examples, this architecture is suitable to be used with tokenized sequences as input.
However, the pipeline that you included has two problems:

It uses a StringVectorizer, which produces a token count matrix instead of token sequences.
It is uses a validation dataset (personae) that contains more information a part from the texts, but this primitive is intended to receive as input nothing but the sequence tokens.

I think that here we need to:

Add a new dataset for binary text classification
Use the keras.Tokenizer instead of the StringVectorizer in the preprocessing steps.

In order to avoid having this PR stuck I will merge it as it is right now, but I will create a new issue to improve the pipeline.

Hector-hedb12 added 2 commits April 3, 2019 22:54

create keras.Sequential.LSTMBinaryTextClassifier primitive

6a12352

pipeline for keras.Sequential.LSTMBinaryTextClassifier primitive

082aa7c

csala requested changes Apr 14, 2019

View reviewed changes

Hector-hedb12 added 2 commits April 15, 2019 08:51

remove batch_size tunable hyperparameter

928cfab

add a fixed verbose hyperparameter

1b2b7f4

csala approved these changes Apr 15, 2019

View reviewed changes

csala requested changes Apr 15, 2019

View reviewed changes

default verbose to 0

6a366fa

csala approved these changes Apr 19, 2019

View reviewed changes

csala merged commit 2f32f2b into MLBazaar:master Apr 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 150: Long Short-Term Memory (LSTM) Binary Classifier for texts #153

Issue 150: Long Short-Term Memory (LSTM) Binary Classifier for texts #153

Hector-hedb12 commented Apr 4, 2019

csala Apr 14, 2019

Hector-hedb12 Apr 15, 2019

csala Apr 14, 2019

Hector-hedb12 Apr 15, 2019

Hector-hedb12 Apr 15, 2019

csala Apr 15, 2019

csala Apr 15, 2019

Hector-hedb12 Apr 15, 2019

csala commented Apr 22, 2019

Issue 150: Long Short-Term Memory (LSTM) Binary Classifier for texts #153

Issue 150: Long Short-Term Memory (LSTM) Binary Classifier for texts #153

Conversation

Hector-hedb12 commented Apr 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csala commented Apr 22, 2019