Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend score_spans for overlapping & non-labeled spans #7209

Merged

Conversation

svlandeg
Copy link
Member

Description

Extending the functionality of Scorer.score_spans so it'll be applicable also for non-NE evaluations like coref mentions, that could be overlapping and do not necessarily have labels.

Types of change

enhancement

Checklist

  • I have submitted the spaCy Contributor Agreement.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@svlandeg svlandeg added enhancement Feature requests and improvements feat / doc Feature: Doc, Span and Token objects feat / training Feature: Training utils, Example, Corpus and converters labels Feb 25, 2021
Copy link
Contributor

@adrianeboyd adrianeboyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor things. (It's nice to see that it wasn't too hard to extend, or at least I hope it wasn't!)

As a side note, I was wondering in general whether we should switch the whole method (and also the NER PRF method) to use character offsets without referring to any token alignments. I think the scorer originally used token alignments because we didn't necessarily have a character representation of the text in an unspaced GoldParse, but I don't think that's an issue now.

spacy/scorer.py Outdated Show resolved Hide resolved
spacy/scorer.py Outdated Show resolved Hide resolved
@svlandeg
Copy link
Member Author

As a side note, I was wondering in general whether we should switch the whole method (and also the NER PRF method) to use character offsets without referring to any token alignments. I think the scorer originally used token alignments because we didn't necessarily have a character representation of the text in an unspaced GoldParse, but I don't think that's an issue now.

That's a good point. Also I think doc.spans could hold any span, they shouldn't necessarily align to tokens, right?

Let's perhaps do it in a follow-up PR though?

@adrianeboyd
Copy link
Contributor

No, I meant the "side note" part! It just came up a few times recently and I mentioned it since we were both looking at the details.

I don't think you can have spans that don't line up with tokens, though?

@svlandeg
Copy link
Member Author

Oh right ofcourse, they're just going to be None :-)

This was referenced Mar 1, 2021
@honnibal
Copy link
Member

honnibal commented Mar 9, 2021

That's a good point. Also I think doc.spans could hold any span, they shouldn't necessarily align to tokens, right?

doc.spans has to hold Span objects, which do have to be token-aligned.

@svlandeg
Copy link
Member Author

svlandeg commented Mar 9, 2021

Yea I don't know where my brain was when I typed that ;-)

Anyway Matt I think if you agree with the naming, this is good to merge?

@honnibal
Copy link
Member

@svlandeg About the naming I think I'd suggest just labelled. I think it's too hard to guess the name if it's verb_label

@adrianeboyd
Copy link
Contributor

Ooh, a double-l conundrum! I think our official position would be labeled. (As a result, I helpfully don't like either!)

@svlandeg
Copy link
Member Author

I think it's too hard to guess the name if it's verb_label

I see your point, but an existing parameter is called has_annotation, so perhaps it would be consistent to call this one has_label? The other parameter I added is currently called allow_overlap ... :|

@adrianeboyd adrianeboyd merged commit 204c2f1 into explosion:master Apr 8, 2021
@svlandeg svlandeg deleted the feature/score_overlapping_spans branch April 8, 2021 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements feat / doc Feature: Doc, Span and Token objects feat / training Feature: Training utils, Example, Corpus and converters
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants