Unable to recognize multiple entities of same type in a sentence without any separation symbol (or a single space) #6340

praneethgb · 2020-08-04T18:24:13Z

test file for reference: https://github.com/RasaHQ/rasa/blob/2b12852ae04aa2d9de6bacdc5b44d1894295fb27/tests/nlu/extractors/test_extractor.py

(
"Amsterdam Berlin and London",
{
"entity": ["city", "city", "O", "city"],
"role": ["O", "O", "O", "O"],
"group": ["O", "O", "O", "O"],
},
None,
[
{"entity": "city", "start": 0, "end": 16, "value": "Amsterdam Berlin"},
{"entity": "city", "start": 21, "end": 27, "value": "London"},
],
),

expected should be :
{"entity": "city", "start": 0, "end": 8, "value": "Amsterdam"},
{"entity": "city", "start": 9, "end": 16, "value": "Berlin"},
{"entity": "city", "start": 21, "end": 27, "value": "London"}

Because Amsterdam (U-city) and Berlin (U-city) are different city entities.

praneethgb · 2020-08-04T18:26:38Z

Hi @tabergma, @tmbo

would be able to provide your inputs on this issue?

sara-tagger · 2020-08-05T06:00:13Z

Thanks for the issue, @tttthomasssss will get back to you about it soon!

You may find help in the docs and the forum, too 🤗

tabergma · 2020-08-05T07:12:03Z

@praneethgb This is expected behaviour. For more explanation see this PR and the related forum post.

praneethgb · 2020-08-05T21:18:00Z

Hi @tabergma,

Since "Amsterdam Berlin" is not a city name.

For Example: consider this use case, my ingredients are eggs(ingredients) lemon juice(ingredients) and milk(ingredients).
ASR output was 'my ingredients are eggs lemon juice and milk.'

When the DIET model is trained for NER, it is trained to recognize them as eggs: U-ingredients, lemon: B-ingredients, juice: L-ingredients, milk: U-ingredients.

In postprocessing also, the results expected to be eggs, milk, and lemon juice as three ingredients. Instead, ingredients eggs and lemon juice merged as one.

DIET: removing BILOU tags at https://github.com/RasaHQ/rasa/blob/master/rasa/nlu/classifiers/diet_classifier.py#L938

tabergma · 2020-08-10T07:01:07Z

@praneethgb Not sure I understand what you are suggesting. I know that "Amsterdam Berlin" is not a city and it is not ideal that we capture it as one entity. The problem is that if we don't merge entities with the same tag when they appear right next to each other, we would not be able to detect "San Fransisco", for example, it would always be detected as "San" and "Fransisco" - two independent entities. Which is also not ideal.
What is your idea to support both cases? Please keep in mind that not all users are using BILOU tagging.
Also, if you add a comma in between your ingredients, everything should be extracted as expected. E.g. "my ingredients are eggs, lemon juice, and milk".

AMR-KELEG · 2020-08-10T11:44:06Z

I am starting to bump into similar issues with some queries so I am sharing my thoughts (I am not exposed enough to how BILOU tagging is used in DIET/ nlu data).

@praneethgb Not sure I understand what you are suggesting. I know that "Amsterdam Berlin" is not a city and it is not ideal that we capture it as one entity. The problem is that if we don't merge entities with the same tag when they appear right next to each other, we would not be able to detect "San Fransisco", for example, it would always be detected as "San" and "Fransisco" - two independent entities. Which is also not ideal.

This might be a bit optimistic but this is how I think the model should behave:

Input	Correct/expected prediction	Processed prediction (merging BILOU tags)
Amsterdam Berlin	`[Amsterdam](U-city) [Berlin](U-city)`	`[Amsterdam](city) [Berlin](city)`
San Fransisco	`[San](B-city) [Fransisco](L-city)`	`[San Fransisco](city)`

What is your idea to support both cases? Please keep in mind that not all users are using BILOU tagging.

Yes, I myself still get confused by BILOU tagging but doing a mapping like the one shown above would be convenient to the users.

Also, if you add a comma in between your ingredients, everything should be extracted as expected. E.g. "my ingredients are eggs, lemon juice, and milk".

We (as developers/ engineers/ researchers) can have our set of guidelines but it's sometimes frustrating to the users who interact with the chatbot to follow a certain format (or at least it would be better if we can support more ways of writing queries i.e: eggs lemon juice and milk and eggs, lemon juice and milk as long as doing so won't hurt the model's performance ).

tabergma · 2020-08-10T13:40:13Z

Yeah, I think we can update this for BILOU tagging, but I guess it will not be possible in case the model is trained without BILOU tagging. @praneethgb or @AMR-KELEG anyone of you willing to create a PR for this?

AMR-KELEG · 2020-08-10T17:26:49Z

Yeah, I think we can update this for BILOU tagging, but I guess it will not be possible in case the model is trained without BILOU tagging. @praneethgb or @AMR-KELEG anyone of you willing to create a PR for this?

I will need to have a look first but yes I am willing to work on it.

praneethgb · 2020-08-10T22:46:50Z

Also, if you add a comma in between your ingredients, everything should be extracted as expected. E.g. "my ingredients are eggs, lemon juice, and milk".

Also, If we use voice input, then comma won't be present in input at all from Automatic Speech Recognition models.

Yeah, I think we can update this for BILOU tagging, but I guess it will not be possible in case the model is trained without BILOU tagging

Yes.

praneethgb · 2020-08-14T22:12:50Z

Hi @tabergma,

I've created PR: #6423 to support this use case.

praneethgb added area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. labels Aug 4, 2020

praneethgb changed the title ~~Unable to recognize multiple entities of same type in a sentence without any separation symbol~~ Unable to recognize multiple entities of same type in a sentence without any separation symbol (or a single space) Aug 4, 2020

praneethgb mentioned this issue Aug 14, 2020

fix for 6340 bug #6423

Merged

4 tasks

praneethgb closed this as completed Aug 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to recognize multiple entities of same type in a sentence without any separation symbol (or a single space) #6340

Unable to recognize multiple entities of same type in a sentence without any separation symbol (or a single space) #6340

praneethgb commented Aug 4, 2020 •

edited

Loading

praneethgb commented Aug 4, 2020 •

edited

Loading

sara-tagger commented Aug 5, 2020

tabergma commented Aug 5, 2020

praneethgb commented Aug 5, 2020 •

edited

Loading

tabergma commented Aug 10, 2020

AMR-KELEG commented Aug 10, 2020

tabergma commented Aug 10, 2020

AMR-KELEG commented Aug 10, 2020

praneethgb commented Aug 10, 2020

praneethgb commented Aug 14, 2020

Unable to recognize multiple entities of same type in a sentence without any separation symbol (or a single space) #6340

Unable to recognize multiple entities of same type in a sentence without any separation symbol (or a single space) #6340

Comments

praneethgb commented Aug 4, 2020 • edited Loading

praneethgb commented Aug 4, 2020 • edited Loading

sara-tagger commented Aug 5, 2020

You may find help in the docs and the forum, too 🤗

tabergma commented Aug 5, 2020

praneethgb commented Aug 5, 2020 • edited Loading

tabergma commented Aug 10, 2020

AMR-KELEG commented Aug 10, 2020

tabergma commented Aug 10, 2020

AMR-KELEG commented Aug 10, 2020

praneethgb commented Aug 10, 2020

praneethgb commented Aug 14, 2020

praneethgb commented Aug 4, 2020 •

edited

Loading

praneethgb commented Aug 4, 2020 •

edited

Loading

praneethgb commented Aug 5, 2020 •

edited

Loading