Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spacy NER bad results #168

Closed
jithurjacob opened this issue Feb 22, 2017 · 14 comments
Closed

Spacy NER bad results #168

jithurjacob opened this issue Feb 22, 2017 · 14 comments
Labels
type:question 💬 Question around usage, examples

Comments

@jithurjacob
Copy link

Hi guys,

I tried RASA on Python 3 by using files from src and doing a lot of reordering. But when I test it with the toy data set this is the result. How can the output from spacy be this bad ( {"entity": "location", "start": 13, "end": 16, "value": "for"} ?

query: I am looking for Chinese food

{"confidence": 0.6662957848930847, "intent": "restaurant_search", "text": "I am looking for Chinese food", "entities": [{"entity": "location", "start": 5, "end": 12, "value": "looking"}, {"entity": "location", "start": 13, "end": 16, "value": "for"}, {"entity": "cuisine", "start": 17, "end": 24, "value": "Chinese"}, {"entity": "cuisine", "start": 25, "end": 29, "value": "food"}]}

@jithurjacob
Copy link
Author

wow spacy is using averaged perceptron so is expecting like 5000 samples for training.... This would be a good point that can be added to Rasa documentation as warning for using spacy+scikit backend

source: explosion/spaCy#773

@jithurjacob
Copy link
Author

any used mitie on Python 3?

@tmbo tmbo added the type:question 💬 Question around usage, examples label Feb 23, 2017
@tmbo
Copy link
Member

tmbo commented Feb 23, 2017

You are completely right there, spacy needs a lot of training data to perform well when annotating entities. I just added an option that allows to reuse pretrained spacy NER models (e.g. for locations or dates).

There is a separate issue for python 3 #68. Status: new code we write is compatible with both (2.7 and 3.6), but we have not ported all parts of the existing code base yet.

@tmbo tmbo closed this as completed Feb 23, 2017
@jithurjacob
Copy link
Author

@tmbo can you share the code to reuse NER

Also can you please tell me if you are getting good results with MITIE ?

I'll try to contribute on weekend towards making it compatible with Py2/3

@alfredfrancis
Copy link
Contributor

alfredfrancis commented Feb 23, 2017

@jithurjacob How about this opensource project on github. It has all functionalities of rasa and it only need few training examples. It uses pycrf suite insted of Spacy NER

@amn41
Copy link
Contributor

amn41 commented Feb 23, 2017

Yes using a conditional random field as an alternative for parsing entities is on our roadmap. It will make more or less sense than the MITIE/spaCy approaches depending on people's use cases, so I would also want to provide good docs & guidelines on when to use which

@jithurjacob
Copy link
Author

@alfredfrancis Thank you for bringing it to my notice I'll definitely try it out.

I'm doing a comparison of various open source bot developer frameworks available on Python, could you please provide the links of other frameworks that you are aware of?

@jithurjacob
Copy link
Author

@amn41 absolutely, it makes sense it would be great if you could list the possible alternatives for sklearn_spacy or MITIE so that others could build the backend and contribute to RASA.

Could you please provide your wish lists for backend of RASA, as depending upon usecase people can select it.

For me the training should be very minimal and I'm happy with average performance.

@alfredfrancis
Copy link
Contributor

@jithurjacob CRF can give pretty good results on minimal training. I'm talking about more than 80% accuracy with just 4-5 examples.
check Chatterbot

@jithurjacob
Copy link
Author

@alfredfrancis I couldnt find any mentioned of CRF being used in chatterbot can you please point me towards the correct source?

@alfredfrancis
Copy link
Contributor

@jithurjacob
how about this one

@jithurjacob
Copy link
Author

@alfredfrancis sorry I got confuesd with ChatterBot... I tried your project with the book cab example.. The results are good for the small POC and I will test it further. One issue I'm facing is that I'm not able to compile the library for Windows 64/Python 3.5... I tested it on Py 2.7 and is working fine.

Could you please share the wheel file if you are having for Py3.5/Win64

@alfredfrancis
Copy link
Contributor

@jithurjacob I haven't tested for Py3/Win

@jithurjacob
Copy link
Author

@tmbo Thank you for adding CRF support in Rasa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:question 💬 Question around usage, examples
Projects
None yet
Development

No branches or pull requests

4 participants