Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 150: Long Short-Term Memory (LSTM) Binary Classifier for texts #153

Merged

Conversation

Hector-hedb12
Copy link
Contributor

Resolves #150

0.75
]
},
"batch_size": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size is currently ignored by the keras adapter, so this line can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

},
"hyperparameters": {
"fixed": {
"classification": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fixed hyperparameter called "verbose" with default false should be added. See #143 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csala according to the doc, verbose should be an integer:

verbose: Integer. 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch.

So, a bool might not work, right? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it actually does work. I guess True/False is internally interpreted as 1/0. Feel free to change it to int type.

@@ -67,6 +67,14 @@
"type": "int",
"default": 20
},
"verbose": {
"type": "int",
"default": 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please set the default to 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csala
Copy link
Contributor

csala commented Apr 22, 2019

@Hector-hedb12 The primitive looks good and can be merged, but I'm afraid the pipeline does not.

According to the Keras examples, this architecture is suitable to be used with tokenized sequences as input.
However, the pipeline that you included has two problems:

  1. It uses a StringVectorizer, which produces a token count matrix instead of token sequences.
  2. It is uses a validation dataset (personae) that contains more information a part from the texts, but this primitive is intended to receive as input nothing but the sequence tokens.

I think that here we need to:

  1. Add a new dataset for binary text classification
  2. Use the keras.Tokenizer instead of the StringVectorizer in the preprocessing steps.

In order to avoid having this PR stuck I will merge it as it is right now, but I will create a new issue to improve the pipeline.

@csala csala merged commit 2f32f2b into MLBazaar:master Apr 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add primitive for Sequence classification with LSTM
2 participants