-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix use_gold_ents behaviour for EntityLinker #13400
Conversation
spacy/pipeline/entity_linker.py
Outdated
def _score_augmented(examples, **kwargs): | ||
# Because of how spaCy works, we can't just score immediately, because Language.evaluate | ||
# calls pipe() on the predicted docs, which won't have entities if there is no NER in the pipeline. | ||
if not self.use_gold_ents: | ||
return scorer(examples, **kwargs) | ||
else: | ||
examples = self._augment_examples(examples) | ||
docs = self.pipe( | ||
(eg.predicted for eg in examples), | ||
) | ||
for eg, doc in zip(examples, docs): | ||
eg.predicted = doc | ||
return scorer(examples, **kwargs) | ||
|
||
self.scorer = _score_augmented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole bit is surely pretty hacky, but considering bug 3 as explained in the PR, I don't see a better option other than changing the entire mechanism how evaluation/scoring of a pipeline works...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this is not really satisfying. The workaround makes sense in this context though.
new_examples = [] | ||
for eg in examples: | ||
ents, _ = eg.get_aligned_ents_and_ner() | ||
new_eg = eg.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making a copy here feels safest? Not 100% about all the possible interactions with all other components in the pipeline, before or after, annotated or not, and frozen or not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, do we manipulate examples in other components? I'm also unsure about this. Either way 👍 for copying it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great spot 👀 Can you elaborate on the cleaning up/restoration from reason 1.? Not sure what you mean by that.
spacy/pipeline/entity_linker.py
Outdated
def _score_augmented(examples, **kwargs): | ||
# Because of how spaCy works, we can't just score immediately, because Language.evaluate | ||
# calls pipe() on the predicted docs, which won't have entities if there is no NER in the pipeline. | ||
if not self.use_gold_ents: | ||
return scorer(examples, **kwargs) | ||
else: | ||
examples = self._augment_examples(examples) | ||
docs = self.pipe( | ||
(eg.predicted for eg in examples), | ||
) | ||
for eg, doc in zip(examples, docs): | ||
eg.predicted = doc | ||
return scorer(examples, **kwargs) | ||
|
||
self.scorer = _score_augmented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this is not really satisfying. The workaround makes sense in this context though.
new_examples = [] | ||
for eg in examples: | ||
ents, _ = eg.get_aligned_ents_and_ner() | ||
new_eg = eg.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, do we manipulate examples in other components? I'm also unsure about this. Either way 👍 for copying it.
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
When the pipeline gets initialized, all the individual components their |
Description
The
use_gold_ents
flag was introduced to allow theentity_linker
to train on gold entities, even if there's no (annotating) NER component in the pipeline.I think this behaviour was buggy because of a few reasons:
initialize()
, NER "predictions" fromeg.reference
were added toeg.predicted
for the first 10 examples, and never cleaned up/restored afterwards.update()
, this transfer happened on all examples, but here the ents were "restored" before calling the loss function. In theory, this should have prevented the EL to learn anything at all, except that in the corresponding unit test, this bug got masked by bug 1, which resulted in a few spurious annotations on the first 10 documentsLanguage.evaluate()
callspipe()
on the predicted docs, which won't have entities if there is no (annotating) NER in the pipeline.To test some of this behaviour, I used different configs with the EL Emerson example, cf explosion/projects#207. The "EL only" config would produce all-zero lines with
master
:Then it would produce actual loss scores after fixing 1 and 2:
And finally, after fixing 3, it would give actual scores:
Types of change
bug fixes & enhancement
Checklist