-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inappropriate CCOMP/NSUBJ dependency labels #905
Comments
Hi, Thanks, interesting analysis. To give you some insight on this, the current parsing model is "greedy", in that it only considers a single analysis. However, it does have a "repair" mechanism, that allows it to alter the partial analysis it's building, in light of new information. I can think of a few ways you might correct this. First, you might want to use the Assuming all is correct, I would say the best solution would be to do some additional training on the cases you're interested in. Given you've already set up a couple of other parsers, I would recommend "tri training" as a strategy: have your other parsers analyse a bunch of text, and take the sentences on which they agree. Then designate these as gold-standard sentences, and use them to train spaCy. Specifically, something like this:
It may take some fiddling to get this right, but the basic concept of this is pretty well established in the literature. Best, |
Thanks Matt, will give that a go. This is really the only issue preventing me from selecting spaCy as our parser. Can't beat the speed with anything else we've tried, so will see if I can train it out for this case. FYI the only other issue we noticed was the tendency for the parser to (not unreasonably) mangle locations (addresses, place names, etc.) that should be compound nouns, but solved it with a custom NER for locations. Best, |
Closing this and making #1057 the master issue – work in progress for spaCy v2.0! |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I've just finished an extensive test of the dependency parser for a social media/messaging use case (1-2 sentence expressions) and have noticed one case where spaCy consistently performs poorly in comparison to other parsers...
For expressions like: "I want a different thing can i have the red one instead", the parser will return:
(grouping verb phrases for "want" and "have" here for brevity)
"I want" -ccomp-> "a different thing can I have the red one instead" (e.g. "a different thing" is the nsubj for "have" instead of "want")
the correct parse should be:
"I want a different thing" -ccomp-> "can I have the red one instead"
placing a comma after "thing", i.e. "I want a different thing, can i have the red one instead", elicits the correct parse by clearing up the ambiguity.
I assume this is a flaw in the model, but realise this is a slightly ambiguous parse as you could get expressions like "I hurt bob can you get me to the doctor", where the parse could go either way ("i hurt" or "i hurt bob" could be valid as the first verb phrase). In cases where the first verb phrase contains "want", "need", "can i get", "can i have", etc. the correct parse should almost always be the nsubj as a child of the root verb though.
That said, in almost all other cases parser performance was very good and I've found the library to be very well thought out. I'd really like to put spaCy into production, but this may be a show stopper for me. Is there a chance the model will be corrected soon, or should I consider building my own?
macOS Sierra
Python 2.7
spaCy 1.7.2
model en_core_web_md-1.2.1
The text was updated successfully, but these errors were encountered: