Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spacy_sklearn and Mitie_sklearn #440

Closed
andikas opened this issue Jun 22, 2017 · 7 comments
Closed

Spacy_sklearn and Mitie_sklearn #440

andikas opened this issue Jun 22, 2017 · 7 comments

Comments

@andikas
Copy link

andikas commented Jun 22, 2017

Hi all,

reffering to this isseu:
#164

I want to clear about which one is better between spacy_sklearn and mitie_sklearn.
I have an example data that I have exported it before from api.ai.
my data consist of 2 entities and 4 intents.
I train the data with those 2 and return 2 different models.

entities:
1: greetings: hello, -> synonims: hello, hi
2. nutrients:

  • protein, ->synonims: protein, proteins
  • carbohidrate, -> synonims: carbohidrate, carbohidrates, carbs
  1. Mitie + Skelarn
    image

image

different respond with same text.

  1. spacy sklearn

image

image

we can see the entities sometimes not detected, even protein and carbohidrate in the same entities.
and here's the impact. it goes to other intents because there's no entities detected.
image
it goes to other intent (bot info intent) -> refering to 'what is your name'

so which one is better?
I'll use about 100 intents and 30 entities.
please help me decide.
and do spacy sklearn didn't return entities?

Thanks,
Cheers

@PHLF
Copy link
Contributor

PHLF commented Jun 23, 2017

What is the size of your dataset: how many examples do you use for training?
Moreover you referenced issue #164 but what in @tmbo's answer doesn't satisfy you?

@andikas
Copy link
Author

andikas commented Jun 23, 2017

For now, I just use dummy training data that I create from API.AI.
The data not that much, only 5 to ten each intent and only 5 data in entities.
do this spacy/mitie good at large data?

my project will use at max 10 data training each intents (there's will be 100 intents), and 10 data for each entities (there's will be aroung 20 entities).

@PHLF
Copy link
Contributor

PHLF commented Jun 23, 2017

To quote @tmbo :

For the spacy backend a suggested amount of training data per entity is around 5000 samples (see explosion/spaCy#773). Hence it might very well be that spacy is faster but bad and mitie is slow but might perform better on your data.

@andikas
Copy link
Author

andikas commented Jun 23, 2017

Thanks @PHLF ,

Yeah I've read that tmbo quotes.
I just confused,
Should I continue my Rasa AI project with my existing data. my data seems too small compared to tmbo quotes. That's all.
I need some advice from you guys as the expert here. your opinion.

Thanks,
Cheers

@PHLF
Copy link
Contributor

PHLF commented Jun 23, 2017

If your pipeline doesn't work well enough for your use case try either to increase your dataset (which is currently too small) or test with another pipeline. Try also to define syntactically distinct intents and entities.

@andikas
Copy link
Author

andikas commented Jun 23, 2017

Thanks for your advice @PHLF ,
I'll review my project again and try that syntactically distinct intents and entities.

Thanks,
Cheers,

@tmbo tmbo closed this as completed Jun 26, 2017
@parvathysarat
Copy link

@andikas Did you choose spaCy or mitie for your training, finally? I'm facing the same problem here, not getting good confidence levels for spacy. Wondering if I should switch to mitie or add more examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants