-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is there any option of composite entity #1334
Comments
What do you mean by a composite entity? |
composite entity means nested entity type, an entity having another entity inside it. |
Sounds like a good feature. I've just encountered a somewhat similar case when nested entities could gladly help. |
I'm just looking at how to implement this and considering using spaCy. I'll likely train the NER on all the base entities and then use custom code to identify the hierarchies and then merge the span, e.g. Then just use the merged span for further processing (e.g. dependencies). It would be helpful if there was some sort of templating annotation system to do this within spaCy. Anybody else working on something like this? |
Has this been implemented somewhere already? Any solution yet? |
I have several qualms with this:
[{'LIKE_NUM': True}, {'ENT_TYPE': 'mass_unit'}], If I wanted to label all Spans that match this pattern as a new entity - I can no longer do this as the new entity will overlap with the existing entity mass_unit thus raising an error.
From my readings of issue discussions, it appears this functionality was implemented in-part to solve rendering issues with displaCy alongside address some span-mangling issues. @honnibal What are your thoughts? |
@AndriyMulyar I agree with this. This new approach made me have to downgrade Spacy as now I cannot do basic things like tag both "Dan Johnson" and "Dan" as NAME due to overlap. In my case I need an option to tag the longest entity in cases of overlap. More generally speaking though I think there should be a parameter for users to pass in to specify what they want to do in cases in overlap (i.e. raise error, longest, shortest, custom option, etc). |
What is the status here? Was there any response pertaining nested entities (please no hacks!). It's a very important factor to train models into understanding context. |
@datascienceteam01 In my current project medaCy I got around this by completely ignoring the entity handling functionality of spaCy and writing my own. It still works fast - even at scale (thousands of documents) - and is able to interface with spaCy models. Although my project and code is engineered to the NLP domain at hand, there are ways to get around it and I hope it can be used as an example. Unfortunately, this means either not upgrading past spaCy v2.0.13 where the hard error was introduced or not using the excellent Matcher functionality. I chose the former route. |
@AndriyMulyar I'm confused as to how this ever worked. The entities have always been stored on the tokens using two attributes: What you should do if you need nested named entities is add a custom attribute, and store them there. In v2.1 you'll also be able to use the I'm really not sure how your code is working in v2.0.12. |
@honnibal The merge I referenced above implemented the throwing of a hard error when attempting to set an entity tag onto a token that already had an entity tag. It appears that what was actually happening was that the entity tag was being overridden (which happened to be the behavior desired) - not that multiple entity tags were being set for a given token. The referenced improved functionality in v2.1 for |
@AndriyMulyar if you just want to overwrite the entity tag, you can just reconcile the entities as you want them before assigning to doc.ents? If you don't need actual overlap or nesting there should be no problem. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Info about spaCy
Thanks.
The text was updated successfully, but these errors were encountered: