-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to recognize multiple entities of same type in a sentence without any separation symbol (or a single space) #6340
Comments
Thanks for the issue, @tttthomasssss will get back to you about it soon! You may find help in the docs and the forum, too 🤗 |
@praneethgb This is expected behaviour. For more explanation see this PR and the related forum post. |
Hi @tabergma, Since "Amsterdam Berlin" is not a city name. For Example: consider this use case, my ingredients are eggs(ingredients) lemon juice(ingredients) and milk(ingredients). When the DIET model is trained for NER, it is trained to recognize them as eggs: U-ingredients, lemon: B-ingredients, juice: L-ingredients, milk: U-ingredients. In postprocessing also, the results expected to be eggs, milk, and lemon juice as three ingredients. Instead, ingredients eggs and lemon juice merged as one. DIET: removing BILOU tags at https://github.com/RasaHQ/rasa/blob/master/rasa/nlu/classifiers/diet_classifier.py#L938 |
@praneethgb Not sure I understand what you are suggesting. I know that "Amsterdam Berlin" is not a city and it is not ideal that we capture it as one entity. The problem is that if we don't merge entities with the same tag when they appear right next to each other, we would not be able to detect "San Fransisco", for example, it would always be detected as "San" and "Fransisco" - two independent entities. Which is also not ideal. |
I am starting to bump into similar issues with some queries so I am sharing my thoughts (I am not exposed enough to how BILOU tagging is used in DIET/ nlu data).
This might be a bit optimistic but this is how I think the model should behave:
Yes, I myself still get confused by BILOU tagging but doing a mapping like the one shown above would be convenient to the users.
We (as developers/ engineers/ researchers) can have our set of guidelines but it's sometimes frustrating to the users who interact with the chatbot to follow a certain format (or at least it would be better if we can support more ways of writing queries i.e: |
Yeah, I think we can update this for BILOU tagging, but I guess it will not be possible in case the model is trained without BILOU tagging. @praneethgb or @AMR-KELEG anyone of you willing to create a PR for this? |
I will need to have a look first but yes I am willing to work on it. |
Also, If we use voice input, then comma won't be present in input at all from Automatic Speech Recognition models.
Yes. |
test file for reference: https://github.com/RasaHQ/rasa/blob/2b12852ae04aa2d9de6bacdc5b44d1894295fb27/tests/nlu/extractors/test_extractor.py
(
"Amsterdam Berlin and London",
{
"entity": ["city", "city", "O", "city"],
"role": ["O", "O", "O", "O"],
"group": ["O", "O", "O", "O"],
},
None,
[
{"entity": "city", "start": 0, "end": 16, "value": "Amsterdam Berlin"},
{"entity": "city", "start": 21, "end": 27, "value": "London"},
],
),
expected should be :
{"entity": "city", "start": 0, "end": 8, "value": "Amsterdam"},
{"entity": "city", "start": 9, "end": 16, "value": "Berlin"},
{"entity": "city", "start": 21, "end": 27, "value": "London"}
Because Amsterdam (U-city) and Berlin (U-city) are different city entities.
The text was updated successfully, but these errors were encountered: