Refresh mitie sklearn and add tests #152

plauto · 2017-02-12T20:40:30Z

Fixes #137

Might need to to perform a further update once #151 is accepted in order to add support for the synonyms within the interpreter

amn41 · 2017-02-14T10:55:21Z

Cool. Could you please also add a test for running the trained model?

tmbo

I got some concerns about the code duplication. How about splitting the mitie and the spacy trainer into two classes (e.g. SpacySklearnIntentTrainer, SpacySklearnEntityTrainer and let the current SpacySklearnTrainer inherit from both). The mitie_sklearn trainer could then pick the functionality from spacy and mitie it needs.

tmbo · 2017-02-15T12:28:20Z

_pytest/test_train.py

+def test_train_mitie_sklearn():
+    # basic conf
+    _config = {
+        'write': os.path.join(os.getcwd(), "rasa_nlu_logs.json"),


should use tempfile

tmbo · 2017-02-15T12:29:11Z

config_mitie.json

@@ -1,5 +1,5 @@
 {
-  "backend": "mitie",
+  "backend": "mitie_sklearn",


is this intended?

I think so. Technically this backend is meant to be faster, so we should consier defaulting it.
@amn41 what do you think?

ah I see. Let's make another config_mitie_sklearn.json . If people want to start out by installing a single dependency - that should work. We should probably throw some warnings if training is started with the MITIE backend and > 5 intents and > 50 intent examples.

tmbo · 2017-02-15T12:32:02Z

src/interpreters/mitie_sklearn_interpreter.py

-    def __init__(self, metadata):
-        self.extractor = named_entity_extractor(metadata["entity_extractor"])  # ,metadata["feature_extractor"])
-        self.classifier = text_categorizer(metadata["intent_classifier"])  # ,metadata["feature_extractor"])
+    def __init__(self, intent_classifier=None, entity_extractor=None, feature_extractor=None, **kwargs):


naming (I know its the same in the MITIEInterpreter, but lets try to improve the code base 😅): all of these three names should indicate that they are strings representing a path not the actual classifier / extractor.

we can append _file at the end.

tmbo · 2017-02-15T12:37:28Z

src/interpreters/mitie_sklearn_interpreter.py

-        return label
+        """Returns the most likely intent and its probability for the input text.
+
+        :param text: text to classify


there is no param text

yeah tokens was used in lieu of text. Now it's fixed. Thanks! :)

tmbo · 2017-02-15T12:38:56Z

src/interpreters/mitie_sklearn_interpreter.py

+        :param text: text to classify
+        :return: tuple of most likely intent name and its probability"""
+        if self.classifier:
+            X = self.featurizer.create_bow_vecs(tokens)


please add a comment why tokens are passed to this function although the function expects sentences.

tokens aren't passed. The function expects sentences and we actually pass those! You will see it fixed in the next commit

tmbo · 2017-02-15T12:45:02Z

src/trainers/mitie_sklearn_trainer.py

-        self.entity_extractor = self.train_entity_extractor(data.entity_examples)
+        self.train_intent_classifier(data.intent_examples, test_split_size)
+
+        num_entity_examples = len([e for e in data.entity_examples if len(e["entities"]) > 0])


please add this logic as a method of TrainingData

tmbo · 2017-02-15T12:47:17Z

src/trainers/mitie_sklearn_trainer.py


-    def train(self, data):
+    def train(self, data, test_split_size=0.1):


I think its time to move this function to the Trainer and remove the duplication in every one of the three trainers, what do you think?

yep that's cleaner. We should add abstract methods for the train_intent_classifier & train_entity_extractor to the base class as well.

no problem! I also think it will be cleaner eventually

tmbo · 2017-02-15T12:57:49Z

src/trainers/mitie_sklearn_trainer.py

@@ -27,46 +40,72 @@ def start_and_end(self, text_tokens, entity_tokens):
        start, end = locs[0], locs[0] + len(entity_tokens)
        return start, end

+    @classmethod
+    def find_entity(cls, ent, text):


This is a duplication of the code in MITIETrainer

tmbo · 2017-02-15T12:58:42Z

src/trainers/mitie_sklearn_trainer.py

+        val_tokens = tokenize(_slice)
+        end = start + len(val_tokens)
+        return start, end
+
    def train_entity_extractor(self, entity_examples):


This is a duplication of the code in MITIETrainer

tmbo · 2017-02-15T12:59:56Z

src/trainers/mitie_sklearn_trainer.py

-        for example in intent_examples:
-            tokens = tokenize(example["text"])
-            trainer.add_labeled_text(tokens, example["intent"])
+    def train_intent_classifier(self, intent_examples, test_split_size=0.1):


This is a duplication of the code in SpacySklearnTrainer

amn41 · 2017-02-16T18:08:40Z

Tom re: your comment. I agree it would be cleaner to split this up & recombine. But since there's only two cases (plain mitie and mitie_sklearn) it feels like overengineering. Maybe there's a simpler (perhaps less idiomatic) way to reduce the code duplication?

plauto · 2017-02-16T18:09:46Z

@tmbo how about then two utils files? Like one sklearn_trainer_utils and mitie_trainer_utils each one of them with an entity and intent classifier. Then you will just import what you need

tmbo · 2017-02-16T22:00:10Z

Yes that sounds good.

tmbo

Looks really good! Added one comment, after that's fixed its a 👍

tmbo · 2017-02-20T09:28:28Z

src/trainers/mitie_sklearn_trainer.py

-        ner = trainer.train()
-        return ner
+    def train_intent_classifier(self, intent_examples, test_split_size=0.1):
+        self.intent_classifier = SklearnIntentClassifier(max_num_threads=self.max_num_threads)


Is there a reason not to use sklearn_trainer_utils.train_intent_classifier here?

…-environment-variable Enable an env variable to define wether llm promt is logged at INFO level

plauto added 7 commits February 12, 2017 20:39

Refresh mitie sklearn and add tests

3ff5be4

Merge with master

d6bf4bc

Fix persistor

e65694b

Fix typo

d60a068

Fix interpreter and trainer

c58a722

Fix mitie trainer

c25b8ee

Fix pep8

ad80293

tmbo requested changes Feb 15, 2017

View reviewed changes

plauto added 10 commits February 15, 2017 22:38

Fixes tmp file in training test

34c8b5b

Use better naming for files

783d758

Replace tokens with text in function parameters

67292b6

Fix var issue

7c2e600

Move attributes into base class Trainer

4c3f281

Set intent and entity classifier and extractors

9f591c7

Fix default backend

5ddf03d

Add config file for mitie_sklearn

0449fec

Abstract methods

3f85938

Move entity num to property

a2bff20

Fix arguments

2594f1c

Split functionalities into utils file

15241e9

tmbo reviewed Feb 20, 2017

View reviewed changes

tmbo mentioned this pull request Feb 20, 2017

Server multithreading #161

Merged

Fix function call

5382c95

plauto merged commit debf0de into master Feb 21, 2017

plauto deleted the mitie-refresh branch February 21, 2017 21:29

tmbo added a commit that referenced this pull request Feb 23, 2017

Fixed regression from #152

283c144

tmbo mentioned this pull request Feb 23, 2017

Provide option to fine tune spacy NER model instead of training from scratch #171

Merged

tmbo added a commit that referenced this pull request Feb 23, 2017

Fixed regression from #152

bb3f6be

taytzehao pushed a commit to taytzehao/rasa that referenced this pull request Jul 14, 2023

Include install instructions for nix (RasaHQ#152)

25e1ad6

vcidst pushed a commit that referenced this pull request Jan 23, 2024

Merge pull request #152 from RasaHQ/ENG-682-enable-prompt-logging-via…

5259a0b

…-environment-variable Enable an env variable to define wether llm promt is logged at INFO level

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refresh mitie sklearn and add tests #152

Refresh mitie sklearn and add tests #152

plauto commented Feb 12, 2017 •

edited

Loading

amn41 commented Feb 14, 2017

tmbo left a comment

tmbo Feb 15, 2017

tmbo Feb 15, 2017

plauto Feb 15, 2017

amn41 Feb 16, 2017

tmbo Feb 15, 2017

plauto Feb 15, 2017

tmbo Feb 15, 2017

plauto Feb 15, 2017 •

edited

Loading

tmbo Feb 15, 2017

plauto Feb 15, 2017

tmbo Feb 15, 2017

tmbo Feb 15, 2017

amn41 Feb 15, 2017

plauto Feb 15, 2017 •

edited

Loading

tmbo Feb 15, 2017

tmbo Feb 15, 2017

tmbo Feb 15, 2017

amn41 commented Feb 16, 2017

plauto commented Feb 16, 2017 •

edited

Loading

tmbo commented Feb 16, 2017

tmbo left a comment

tmbo Feb 20, 2017


		def train(self, data):
		def train(self, data, test_split_size=0.1):

Refresh mitie sklearn and add tests #152

Refresh mitie sklearn and add tests #152

Conversation

plauto commented Feb 12, 2017 • edited Loading

amn41 commented Feb 14, 2017

tmbo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

plauto Feb 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

plauto Feb 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amn41 commented Feb 16, 2017

plauto commented Feb 16, 2017 • edited Loading

tmbo commented Feb 16, 2017

tmbo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

plauto commented Feb 12, 2017 •

edited

Loading

plauto Feb 15, 2017 •

edited

Loading

plauto Feb 15, 2017 •

edited

Loading

plauto commented Feb 16, 2017 •

edited

Loading