Extend score_spans for overlapping & non-labeled spans #7209

svlandeg · 2021-02-25T16:35:51Z

Description

Extending the functionality of Scorer.score_spans so it'll be applicable also for non-NE evaluations like coref mentions, that could be overlapping and do not necessarily have labels.

Types of change

enhancement

Checklist

I have submitted the spaCy Contributor Agreement.
I ran the tests, and all new and existing tests passed.
My changes don't require a change to the documentation, or if they do, I've added all required information.

adrianeboyd

Just a few minor things. (It's nice to see that it wasn't too hard to extend, or at least I hope it wasn't!)

As a side note, I was wondering in general whether we should switch the whole method (and also the NER PRF method) to use character offsets without referring to any token alignments. I think the scorer originally used token alignments because we didn't necessarily have a character representation of the text in an unspaced GoldParse, but I don't think that's an issue now.

spacy/scorer.py

svlandeg · 2021-02-25T18:53:30Z

As a side note, I was wondering in general whether we should switch the whole method (and also the NER PRF method) to use character offsets without referring to any token alignments. I think the scorer originally used token alignments because we didn't necessarily have a character representation of the text in an unspaced GoldParse, but I don't think that's an issue now.

That's a good point. Also I think doc.spans could hold any span, they shouldn't necessarily align to tokens, right?

Let's perhaps do it in a follow-up PR though?

adrianeboyd · 2021-02-26T08:39:07Z

No, I meant the "side note" part! It just came up a few times recently and I mentioned it since we were both looking at the details.

I don't think you can have spans that don't line up with tokens, though?

svlandeg · 2021-02-26T08:44:20Z

Oh right ofcourse, they're just going to be None :-)

…rlapping_spans

honnibal · 2021-03-09T03:05:00Z

That's a good point. Also I think doc.spans could hold any span, they shouldn't necessarily align to tokens, right?

doc.spans has to hold Span objects, which do have to be token-aligned.

svlandeg · 2021-03-09T08:28:51Z

Yea I don't know where my brain was when I typed that ;-)

Anyway Matt I think if you agree with the naming, this is good to merge?

honnibal · 2021-03-10T08:00:42Z

@svlandeg About the naming I think I'd suggest just labelled. I think it's too hard to guess the name if it's verb_label

adrianeboyd · 2021-03-10T08:05:05Z

Ooh, a double-l conundrum! I think our official position would be labeled. (As a result, I helpfully don't like either!)

svlandeg · 2021-03-24T19:26:03Z

I think it's too hard to guess the name if it's verb_label

I see your point, but an existing parameter is called has_annotation, so perhaps it would be consistent to call this one has_label? The other parameter I added is currently called allow_overlap ... :|

svlandeg added 4 commits February 25, 2021 16:40

extend span scorer with consider_label and allow_overlap

011b0a1

unit test for spans y2x overlap

b90486f

add score_spans unit test

064c49a

docs for new fields in scorer.score_spans

e47fcdb

svlandeg added enhancement Feature requests and improvements feat / doc Feature: Doc, Span and Token objects feat / training Feature: Training utils, Example, Corpus and converters labels Feb 25, 2021

adrianeboyd reviewed Feb 25, 2021

View reviewed changes

spacy/scorer.py Outdated Show resolved Hide resolved

spacy/scorer.py Outdated Show resolved Hide resolved

rename to include_label

1a43b2c

svlandeg added 2 commits February 26, 2021 10:00

spell out if-else for clarity

2a5a1be

Merge remote-tracking branch 'upstream/master' into feature/score_ove…

3860b19

…rlapping_spans

This was referenced Mar 1, 2021

Native coref component #7243

Merged

Native coref component #7264

Closed

svlandeg and others added 2 commits April 1, 2021 21:33

rename to 'labeled'

c15fddb

Merge branch 'master' into feature/score_overlapping_spans

b353a2b

adrianeboyd merged commit 204c2f1 into explosion:master Apr 8, 2021

svlandeg deleted the feature/score_overlapping_spans branch April 8, 2021 10:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend score_spans for overlapping & non-labeled spans #7209

Extend score_spans for overlapping & non-labeled spans #7209

svlandeg commented Feb 25, 2021

adrianeboyd left a comment

svlandeg commented Feb 25, 2021

adrianeboyd commented Feb 26, 2021

svlandeg commented Feb 26, 2021

honnibal commented Mar 9, 2021

svlandeg commented Mar 9, 2021

honnibal commented Mar 10, 2021

adrianeboyd commented Mar 10, 2021

svlandeg commented Mar 24, 2021

Extend score_spans for overlapping & non-labeled spans #7209

Extend score_spans for overlapping & non-labeled spans #7209

Conversation

svlandeg commented Feb 25, 2021

Description

Types of change

Checklist

adrianeboyd left a comment

Choose a reason for hiding this comment

svlandeg commented Feb 25, 2021

adrianeboyd commented Feb 26, 2021

svlandeg commented Feb 26, 2021

honnibal commented Mar 9, 2021

svlandeg commented Mar 9, 2021

honnibal commented Mar 10, 2021

adrianeboyd commented Mar 10, 2021

svlandeg commented Mar 24, 2021