Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native coref component #7264

Closed
wants to merge 215 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
215 commits
Select commit Hold shift + click to select a range
e0c45c6
Native coref component (#7243)
svlandeg Mar 3, 2021
3608b7b
Merge branch 'master' into feature/coref
polm May 15, 2021
7c42a8c
Migrate coref code
polm May 15, 2021
91b1114
Minor fixes
polm May 17, 2021
e303628
Attempt to use registry correctly
polm May 17, 2021
a33d294
Merge remote-tracking branch 'upstream/develop' into feature/coref
polm May 18, 2021
0517155
Fiddle with get_mentions definition
polm May 18, 2021
883c137
Add basic tuplify init
polm May 18, 2021
a7d9c81
Make get_sentence_map work with init
polm May 18, 2021
0620820
Deal with generators in tuplify
polm May 18, 2021
2486b8a
Fix pipeline intialize
polm May 18, 2021
d22acee
Fix backprop
polm May 18, 2021
fa92daf
Break pairwise operations into pseudolayers
polm May 20, 2021
8c5df62
Help out python gc in coref backprop
polm May 20, 2021
ff3fed0
Catch a stray reference
polm May 20, 2021
e1b4a85
Fix loss
polm May 21, 2021
f6652c9
Add new coref scoring
polm May 21, 2021
0942a0b
Remove coref_er.py
polm May 21, 2021
d6fd5fe
Minor cleanup
polm May 24, 2021
d6389b1
Don't use a generator for no reason
polm May 24, 2021
a484245
Remove references to coref_er
polm May 24, 2021
ba2e491
Merge remote-tracking branch 'upstream/master' into feature/coref
svlandeg May 27, 2021
2e3c0e2
delete outdated tests
svlandeg May 27, 2021
9100265
set versions to v1 instead of v0
svlandeg May 27, 2021
04b55bf
removing unused imports
svlandeg May 27, 2021
391b512
fix types of fwd functions
svlandeg May 27, 2021
0f5c586
add basic tests for debugging
svlandeg May 28, 2021
0d81bce
add failing test for too short a sentence
svlandeg May 28, 2021
0aa1083
avoid repetitive entities in the output
svlandeg May 28, 2021
4a4ef72
Clean up unused functions
polm May 28, 2021
18444fc
Remove old comment
polm Jun 3, 2021
67d9ebc
Transpose before calculating loss
polm Jun 3, 2021
7efbc72
Don't use is_sentenced
polm Jun 12, 2021
e728b0e
Silence warning
polm Jun 12, 2021
d71198e
Replace squeeze with flatten
polm Jun 12, 2021
96be7e8
Change topk to sort descending
polm Jun 13, 2021
8452d11
Fix typo, remove old comment
polm Jun 13, 2021
cb2364c
Fix type of mask
polm Jun 17, 2021
fce804a
Minor optimization
polm Jun 17, 2021
848fd10
Small fix
polm Jun 17, 2021
a62121e
Expose more hyperparameters
polm Jun 17, 2021
ccf5611
Remove old comments
polm Jun 17, 2021
5c98c4c
Probably fix pw prod backprop
polm Jun 17, 2021
2334485
Remove unused function
polm Jun 28, 2021
4f377d8
Fix bug in crossing span detection
polm Jun 28, 2021
b02df61
Add test for crossing spans
polm Jun 28, 2021
3f66e18
Clean up pw_prod loss
polm Jul 3, 2021
f2e0e9d
Move placeholder handling into model code
polm Jul 3, 2021
d74fa82
Fix axis handling in topk
polm Jul 3, 2021
865caed
Remove XXX comment
polm Jul 3, 2021
251a5b4
Minor fix in crossing spans code
polm Jul 3, 2021
2d3c559
On initialize, use just two samples
polm Jul 3, 2021
5db28ec
Tweak mention limit calculation
polm Jul 3, 2021
8f66176
Fix loss?
polm Jul 5, 2021
13bef2d
Add width prior feature
polm Jul 5, 2021
eb5820b
Improve take_vecs implementation
polm Jul 5, 2021
d0b041a
Switch to using Thinc tuplify
polm Jul 8, 2021
f34915c
Use scatter_add to speed up span embed backprop
polm Jul 10, 2021
dc1f974
Merge branch 'master' into feature/coref
polm Jul 10, 2021
d7d317a
Clean up span embedding code
polm Jul 10, 2021
e00bd42
Fix span embeds
polm Jul 10, 2021
c25ec29
Cleanup
polm Jul 10, 2021
447c707
Fix loss
polm Jul 10, 2021
80a1707
Remove unused code
polm Jul 11, 2021
f1796e4
Fix mention list bug
polm Jul 14, 2021
3684f7f
Remove comment from fixed test
polm Jul 14, 2021
4a9dc00
Use relative indices for mentions
polm Jul 14, 2021
e9626e3
Fix serialization test
polm Jul 14, 2021
9b63cbb
Add extract spans import
polm Jul 15, 2021
a4531be
Add simple mention test
polm Jul 18, 2021
bc081c2
Add full traditional scoring
polm Jul 18, 2021
8bd0474
Run black
polm Jul 18, 2021
3ed0fae
Add multi-sentence mention test
polm Jul 19, 2021
a151c62
Add sentence map test
polm Jul 19, 2021
1d1679d
Minor speedup
polm Jul 21, 2021
56803d3
Change mention limit to match reference implementations
polm Aug 8, 2021
00d481d
Stack the mention scorer
polm Aug 9, 2021
230698d
Fix bug in scorer
polm Aug 12, 2021
c7f586c
Merge branch 'master' into feature/coref
polm Feb 3, 2022
0c15ab7
remove irrelevant unit test (behaviour clarified by new error msgs ar…
svlandeg Feb 7, 2022
c0cd502
Start bringin in wl-coref
polm Mar 6, 2022
1c697b4
Remove references to config
polm Mar 8, 2022
35cc2b1
Add span predictor code
polm Mar 8, 2022
c4f9c24
The coref model is able to be loaded
polm Mar 9, 2022
d22a002
Forward/backward pass works
polm Mar 14, 2022
8eadf37
Training runs now
polm Mar 14, 2022
dfec699
Training works now
polm Mar 14, 2022
e6917d8
Add util functions for wl-coref
polm Mar 14, 2022
0522a43
Make span2head component
polm Mar 15, 2022
17d017a
Remove span2head
polm Mar 15, 2022
55039a6
Remove old default config
polm Mar 15, 2022
abdc7d8
Clean up util code
polm Mar 15, 2022
d0ae259
Delete all the coref-hoi code
polm Mar 15, 2022
5650853
Remove unused functions
polm Mar 16, 2022
7811a11
Change architecture
polm Mar 16, 2022
6974f55
Hack for transformer listener size
polm Mar 16, 2022
0275ae2
Remove stale comment
polm Mar 16, 2022
6855df0
Skeleton for span predictor component
polm Mar 16, 2022
1a79d18
Formatting
polm Mar 16, 2022
a098849
Add fake batching
polm Mar 18, 2022
db422ab
remove unnecessary .device
Mar 18, 2022
2190cbc
Add progress on SpanPredictor component
polm Mar 19, 2022
eec00ce
Fix various sizes in SpanPredictor FFNN
polm Mar 23, 2022
1eaf8fb
span predictor debug start
Mar 23, 2022
150e7c4
conflict
Mar 23, 2022
706b2e6
gearing up SpanPredictor for gold-heads
Mar 24, 2022
a872c69
merge
Mar 24, 2022
1c5dabc
merge SpanPredictor attributes
Mar 24, 2022
83ac047
remove useless extra prefix and device from spanpredictor
Mar 24, 2022
7304604
make sure predicted and reference keeps aligned
Mar 25, 2022
4fc4034
handle empty head_ids
Mar 28, 2022
e4b4b67
handle empty clusters
Mar 28, 2022
06d680b
addressing suggestions by @polm
Mar 28, 2022
7ff99a3
nicer restore
Mar 28, 2022
63a41ba
fix score overwriting bug
Mar 30, 2022
a1d0219
prepare for aligned heads-spans training
Apr 4, 2022
ef141ad
span accuracy score
Apr 4, 2022
3ba9131
update with eg.predited as other components
Apr 7, 2022
2a1ad4c
add backprop callback to spanpredictor
Apr 8, 2022
7a239f2
report start- and end-accuracies separately
Apr 8, 2022
6aedd98
fixing scorer
Apr 11, 2022
b53113e
Preparing span predictor for predicting from gold (#10547)
kadarakos Apr 13, 2022
d470fa0
Adjust end indices
polm Apr 13, 2022
2300f4d
Fix span score logging
polm Apr 13, 2022
e8af027
Remove all coref scoring exept LEA
polm Apr 13, 2022
8181d45
Multiply accuracy by 100
polm Apr 14, 2022
08729e0
Remove end adjustment
polm Apr 14, 2022
afd255c
Undo multiply by 100
polm Apr 14, 2022
683f470
Merge branch 'master' into feature/coref
polm Apr 18, 2022
6b51258
clean up unused imports + black formatting
svlandeg May 9, 2022
117a9ef
Initial coref docs
polm May 10, 2022
f852c5c
Split span predictor component into its own file
polm May 10, 2022
41fc092
Split span predictor model into its own file
polm May 10, 2022
33f4f90
Formatting
polm May 10, 2022
e512874
small refactor and docs
kadarakos May 10, 2022
7cf6bcc
merge misery
kadarakos May 10, 2022
57165f9
Merge pull request #10782 from kadarakos/feature/coref
polm May 11, 2022
b7ac4b3
fixing arguments
kadarakos May 11, 2022
14eb20f
Add span predictor docs
polm May 12, 2022
6a8625e
First draft for architecture docs
polm May 13, 2022
13481fb
Remove unused param, add TODOs about typing
polm May 13, 2022
2e8f0e9
Rename coref params
polm May 16, 2022
403fb95
merge
kadarakos May 17, 2022
1dc3894
new parameters
kadarakos May 17, 2022
e38e84a
Merge pull request #10812 from kadarakos/feature/coref
polm May 19, 2022
9da16df
Add guards around torch import
polm May 24, 2022
b1118ce
Move epsilon
polm May 24, 2022
5cbc9f4
Use thinc.util.has_torch
polm May 24, 2022
c9233a5
Import torch from thinc
polm May 24, 2022
3807a1b
Merge pull request #10844 from polm/feature/coref-torch-guard
polm May 25, 2022
303269c
Skip coref test if no torch
polm May 25, 2022
6999436
Fix coref tests
polm May 25, 2022
6087da9
Suggestions from code review, cleanup, typing
polm May 25, 2022
e721c7b
Import cleanup
polm May 25, 2022
2a8efda
Code review suggestions, cleanup
polm May 25, 2022
838f501
Black formatting
polm May 25, 2022
015050f
Merge branch 'master' into feature/coref
svlandeg May 25, 2022
f75a528
Update spacy/ml/models/spancat.py
adrianeboyd May 25, 2022
b8bdf99
fix types in scorer + black
svlandeg May 25, 2022
3fee693
Merge branch 'feature/coref' of https://github.com/explosion/spacy in…
svlandeg May 25, 2022
cea40c9
fix types + black formatting
svlandeg May 25, 2022
aa2eb27
small type fixes
svlandeg May 25, 2022
196886b
Fix coref size inference (#10916)
polm Jun 8, 2022
16894e6
Refactor Coval Scoring code (#10875)
polm Jun 22, 2022
af6d5ae
Initial test of mismatched tokenization
polm Jun 28, 2022
ef5762d
Bad hack to get tests to run
polm Jun 28, 2022
d1ff933
Test works
polm Jun 28, 2022
9f94538
Merge branch 'master' into feature/coref
kadarakos Jun 28, 2022
1a78259
make sure same device
kadarakos Jun 28, 2022
0076f0f
span predictor device fix
kadarakos Jun 29, 2022
dd812ca
Handle case with nothing to score in span predictor
polm Jun 29, 2022
c59aeeb
Merge pull request #11043 from kadarakos/feature/coref
polm Jul 1, 2022
7972088
Merge branch 'feature/coref' into fix/coref-alignment
polm Jul 1, 2022
5192ac1
Clean tests.
polm Jul 3, 2022
1dacecb
Run black
polm Jul 3, 2022
201731d
Move spans2ints to util
polm Jul 3, 2022
1a4dbb7
Add basic span predictor tests
polm Jul 3, 2022
619b110
Use config to specify tok2vec_size
polm Jul 3, 2022
a46bc03
Add failing test with tokenization mismatch
polm Jul 3, 2022
fd574a8
Update overfitting test
polm Jul 3, 2022
cf33b48
Update tests
polm Jul 3, 2022
b09bbc7
Fix alignment issues
polm Jul 3, 2022
c7f333d
Rename spans2ints > _spans_to_offsets
polm Jul 4, 2022
178feae
Add tests to give up with whitespace differences
polm Jul 4, 2022
63e27b5
Update spacy/ml/models/coref_util.py
polm Jul 6, 2022
8f598d7
Feedback from code review
polm Jul 6, 2022
6f5cf83
Remove _spans_to_offsets
polm Jul 6, 2022
da9c379
Update docs
polm Jul 6, 2022
c4de3e5
Remove old TODOs
polm Jul 6, 2022
5e40573
Update span predictor docstrings
polm Jul 6, 2022
ce49136
Update NotImplementedError for coref component
polm Jul 6, 2022
ba1bf8a
First take at dimension inference
polm Jul 6, 2022
bd17c38
It works!
polm Jul 6, 2022
f67c173
Remove tok2vec_size from coref
polm Jul 6, 2022
b59b924
Use normal PyTorchWrapper in coref
polm Jul 6, 2022
b0800ea
Do dimension inference in span predictor
polm Jul 6, 2022
da81a90
Span predictor leftovers
polm Jul 6, 2022
2eee0d2
Fix types
polm Jul 8, 2022
2c2791d
Merge pull request #11087 from polm/coref/doc-update
polm Jul 11, 2022
1b3db14
Merge branch 'fix/coref-alignment' into feature/coref
polm Jul 11, 2022
6d9eafe
Merge branch 'feature/coref' into fix/coref-alignment
polm Jul 11, 2022
9cbb970
Merge pull request #11042 from polm/fix/coref-alignment
polm Jul 11, 2022
4d03239
Merge branch 'feature/coref' into coref/dimension-inference
polm Jul 11, 2022
baeb35f
Add type annotations for internal models
polm Jul 11, 2022
5969634
Merge branch 'master' into coref/dimension-inference
polm Jul 11, 2022
f9c82e2
Update error number
polm Jul 11, 2022
7792229
Merge branch 'master' into feature/coref
polm Jul 11, 2022
0f3c456
Update error number
polm Jul 11, 2022
64a0bf4
Merge branch 'feature/coref' into coref/dimension-inference
polm Jul 12, 2022
1baa334
Make get_clusters_from_doc return spans in order
polm Jul 12, 2022
07e8556
Remove config from coref tests
polm Jul 12, 2022
90973fa
Merge pull request #11089 from polm/coref/dimension-inference
polm Jul 12, 2022
2e9dadf
Remove orphaned function
polm Jul 12, 2022
3a7658e
Update docs to mark experimental, rename SpanPredictor to SpanResolver
polm Aug 4, 2022
62ffddd
Update architectures
polm Aug 4, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions licenses/3rd_party_licenses.txt
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,36 @@ distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.


coval
-----

* Files: scorer.py

The implementations of ClusterEvaluator, lea, get_cluster_info, and
get_markable_assignments are adapted from coval, which is distributed
under the following license:

The MIT License (MIT)

Copyright 2018 Nafise Sadat Moosavi (ns.moosavi at gmail dot com)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

1 change: 1 addition & 0 deletions spacy/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -939,6 +939,7 @@ class Errors(metaclass=ErrorsWithCodes):
"`{arg2}`={arg2_values} but these arguments are conflicting.")
E1043 = ("Expected None or a value in range [{range_start}, {range_end}] for entity linker threshold, but got "
"{value}.")
E1044 = ("Misalignment in coref. Head token has no match in training doc.")


# Deprecated model shortcuts, only used in errors and warnings
Expand Down
7 changes: 7 additions & 0 deletions spacy/ml/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,10 @@
from .tagger import * # noqa
from .textcat import * # noqa
from .tok2vec import * # noqa

# some models require Torch
from thinc.util import has_torch
if has_torch:
from .coref import * #noqa
from .span_predictor import * #noqa

Loading