Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is there any option of composite entity #1334

Closed
ankitarath2011 opened this issue Sep 19, 2017 · 13 comments
Closed

is there any option of composite entity #1334

ankitarath2011 opened this issue Sep 19, 2017 · 13 comments
Labels
enhancement Feature requests and improvements feat / ner Feature: Named Entity Recognizer

Comments

@ankitarath2011
Copy link

Info about spaCy

  • Python version: 2.7.13
  • Platform: Windows-10-10.0.14393
  • spaCy version: 1.9.0
  • Installed models: en
  1. Is there any thing composite entity in spacy. If yes how can I do that.
  2. How can I get from city and to city separately instead of single location

Thanks.

@honnibal
Copy link
Member

What do you mean by a composite entity?

@honnibal honnibal added the enhancement Feature requests and improvements label Sep 19, 2017
@ankitarath2011
Copy link
Author

composite entity means nested entity type, an entity having another entity inside it.
e.g. "from [location]" will be one entity that contains another entity location in it

@fucking-signup
Copy link

Sounds like a good feature. I've just encountered a somewhat similar case when nested entities could gladly help.

@thedataist
Copy link

thedataist commented Jan 18, 2018

I'm just looking at how to implement this and considering using spaCy. I'll likely train the NER on all the base entities and then use custom code to identify the hierarchies and then merge the span, e.g.
starting with base entities:
I'm at Starbucks ORG 789 CARDINAL Mission ST_NAME St ST_TYPE San Fran CITY
⬇️
Starbucks ORG 789 CARDINAL [Mission St] ST_NAMED San Fran CITY
⬇️
[Starbucks 789 Mission St] ST_ADDRESS San Fran CITY
⬇️
[Starbucks 789 Mission St San Fran] GEO_ADDRESS

Then just use the merged span for further processing (e.g. dependencies). It would be helpful if there was some sort of templating annotation system to do this within spaCy.

Anybody else working on something like this?

@ines ines added the feat / ner Feature: Named Entity Recognizer label Mar 27, 2018
@teddius
Copy link

teddius commented Nov 18, 2018

Has this been implemented somewhere already? Any solution yet?

@AndriyMulyar
Copy link
Contributor

AndriyMulyar commented Dec 22, 2018

It appears nested (or overlapping) entities were disallowed in spacy via #2880.
EDIT: Nested entities were never implemented. What was disallowed was setting the entity type attribute of a token span that intersected with a token span already containing a set entity type attribute. Such a use case can be met by utilizing the v2.1 Matcher functionality of patterns with custom token attributes (set to the other entities).

I have several qualms with this:

  1. This limits named entity recognition to be a multi-class classification problem as opposed to the more general multi-label classification problem - dependency information between entities (labels) is lost if entities of interest naturally occur inside one another or overlap.
  2. This commit has limited the capabilities of using Matcher with entities. For instance, I have an entity mass_unit that I would like to use as part of a pattern for matching a number followed by a unit like so:
[{'LIKE_NUM': True}, {'ENT_TYPE': 'mass_unit'}],

If I wanted to label all Spans that match this pattern as a new entity - I can no longer do this as the new entity will overlap with the existing entity mass_unit thus raising an error.

  1. Nested entities arise in many NER applications - implicitly disallowing them limits the capabilities of the package. I wrote a fair amount of code to overcome to this in my implementations but at the end just chose to not support past the version that had this commit which is not a long term solution.

From my readings of issue discussions, it appears this functionality was implemented in-part to solve rendering issues with displaCy alongside address some span-mangling issues.
Related issues: #2550

@honnibal What are your thoughts?

@isaacmg
Copy link

isaacmg commented Dec 23, 2018

@AndriyMulyar I agree with this. This new approach made me have to downgrade Spacy as now I cannot do basic things like tag both "Dan Johnson" and "Dan" as NAME due to overlap. In my case I need an option to tag the longest entity in cases of overlap. More generally speaking though I think there should be a parameter for users to pass in to specify what they want to do in cases in overlap (i.e. raise error, longest, shortest, custom option, etc).

@alejandrojapkin
Copy link

What is the status here? Was there any response pertaining nested entities (please no hacks!). It's a very important factor to train models into understanding context.

@AndriyMulyar
Copy link
Contributor

@datascienceteam01 In my current project medaCy I got around this by completely ignoring the entity handling functionality of spaCy and writing my own. It still works fast - even at scale (thousands of documents) - and is able to interface with spaCy models. Although my project and code is engineered to the NLP domain at hand, there are ways to get around it and I hope it can be used as an example. Unfortunately, this means either not upgrading past spaCy v2.0.13 where the hard error was introduced or not using the excellent Matcher functionality. I chose the former route.

@honnibal
Copy link
Member

@AndriyMulyar I'm confused as to how this ever worked. The entities have always been stored on the tokens using two attributes: ent_iob and ent_type. Each token can only receive one ent_iob value, indicating whether an entity starts, ends, or is internal to an entity, and indicating the type. So the implementation has never had a way of storing nested named entities.

What you should do if you need nested named entities is add a custom attribute, and store them there. In v2.1 you'll also be able to use the Matcher over the values in extension attributes as well.

I'm really not sure how your code is working in v2.0.12.

@AndriyMulyar
Copy link
Contributor

AndriyMulyar commented Feb 18, 2019

@honnibal The merge I referenced above implemented the throwing of a hard error when attempting to set an entity tag onto a token that already had an entity tag. It appears that what was actually happening was that the entity tag was being overridden (which happened to be the behavior desired) - not that multiple entity tags were being set for a given token.

The referenced improved functionality in v2.1 for Matcher seems like it will provide a sufficient solution to this use case. This thread can probably be closed.

@syllog1sm
Copy link
Contributor

@AndriyMulyar if you just want to overwrite the entity tag, you can just reconcile the entities as you want them before assigning to doc.ents? If you don't need actual overlap or nesting there should be no problem.

@lock
Copy link

lock bot commented Mar 20, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Mar 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Feature requests and improvements feat / ner Feature: Named Entity Recognizer
Projects
None yet
Development

No branches or pull requests

10 participants