-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native coref component #7264
Native coref component #7264
Commits on Mar 3, 2021
-
Native coref component (#7243)
* initial coref_er pipe * matcher more flexible * base coref component without actual model * initial setup of coref_er.score * rename to include_label * preliminary score_clusters method * apply scoring in coref component * IO fix * return None loss for now * rename to CoreferenceResolver * some preliminary unit tests * use registry as callable
Configuration menu - View commit details
-
Copy full SHA for e0c45c6 - Browse repository at this point
Copy the full SHA e0c45c6View commit details
Commits on May 15, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 3608b7b - Browse repository at this point
Copy the full SHA 3608b7bView commit details -
This includes the coref code that was being tested separately, modified to work in spaCy. It hasn't been tested yet and presumably still needs fixes. In particular, the evaluation code is currently omitted. It's unclear at the moment whether we want to use a complex scorer similar to the official one, or a simpler scorer using more modern evaluation methods.
Configuration menu - View commit details
-
Copy full SHA for 7c42a8c - Browse repository at this point
Copy the full SHA 7c42a8cView commit details
Commits on May 17, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 91b1114 - Browse repository at this point
Copy the full SHA 91b1114View commit details -
Configuration menu - View commit details
-
Copy full SHA for e303628 - Browse repository at this point
Copy the full SHA e303628View commit details
Commits on May 18, 2021
-
Configuration menu - View commit details
-
Copy full SHA for a33d294 - Browse repository at this point
Copy the full SHA a33d294View commit details -
Fiddle with get_mentions definition
Ended up not making a difference, but oh well.
Configuration menu - View commit details
-
Copy full SHA for 0517155 - Browse repository at this point
Copy the full SHA 0517155View commit details -
Configuration menu - View commit details
-
Copy full SHA for 883c137 - Browse repository at this point
Copy the full SHA 883c137View commit details -
Make get_sentence_map work with init
When sentences are not available, just treat the whole doc as one sentence. A reasonable general fallback, but important due to the init call, where upstream components aren't run.
Configuration menu - View commit details
-
Copy full SHA for a7d9c81 - Browse repository at this point
Copy the full SHA a7d9c81View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0620820 - Browse repository at this point
Copy the full SHA 0620820View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2486b8a - Browse repository at this point
Copy the full SHA 2486b8aView commit details -
Configuration menu - View commit details
-
Copy full SHA for d22acee - Browse repository at this point
Copy the full SHA d22aceeView commit details
Commits on May 20, 2021
-
Break pairwise operations into pseudolayers
This makes their scope tighter and more contained, and has the nice side effect that fewer things need to be passed around for backprop.
Configuration menu - View commit details
-
Copy full SHA for fa92daf - Browse repository at this point
Copy the full SHA fa92dafView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8c5df62 - Browse repository at this point
Copy the full SHA 8c5df62View commit details -
Configuration menu - View commit details
-
Copy full SHA for ff3fed0 - Browse repository at this point
Copy the full SHA ff3fed0View commit details
Commits on May 21, 2021
-
The loss was being returned as a single element array, which caused training to die when it attempted to turn it into JSON.
Configuration menu - View commit details
-
Copy full SHA for e1b4a85 - Browse repository at this point
Copy the full SHA e1b4a85View commit details -
This is closer to the traditional evaluation method. That uses an average of three scores, this is just using the bcubed metric for now (nothing special about bcubed, just picked one). The scoring implementation comes from the coval project. It relies on scipy, which is one issue, and is rather involved, which is another. Besides being comparable with traditional evaluations, this scoring is relatively fast.
Configuration menu - View commit details
-
Copy full SHA for f6652c9 - Browse repository at this point
Copy the full SHA f6652c9View commit details -
The intent of this was that it would be a component pipeline that used entities as input, but that's now covered by the get_mentions function as a pipeline arg.
Configuration menu - View commit details
-
Copy full SHA for 0942a0b - Browse repository at this point
Copy the full SHA 0942a0bView commit details
Commits on May 24, 2021
-
Configuration menu - View commit details
-
Copy full SHA for d6fd5fe - Browse repository at this point
Copy the full SHA d6fd5feView commit details -
Configuration menu - View commit details
-
Copy full SHA for d6389b1 - Browse repository at this point
Copy the full SHA d6389b1View commit details -
Configuration menu - View commit details
-
Copy full SHA for a484245 - Browse repository at this point
Copy the full SHA a484245View commit details
Commits on May 27, 2021
-
Configuration menu - View commit details
-
Copy full SHA for ba2e491 - Browse repository at this point
Copy the full SHA ba2e491View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2e3c0e2 - Browse repository at this point
Copy the full SHA 2e3c0e2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9100265 - Browse repository at this point
Copy the full SHA 9100265View commit details -
Configuration menu - View commit details
-
Copy full SHA for 04b55bf - Browse repository at this point
Copy the full SHA 04b55bfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 391b512 - Browse repository at this point
Copy the full SHA 391b512View commit details
Commits on May 28, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 0f5c586 - Browse repository at this point
Copy the full SHA 0f5c586View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0d81bce - Browse repository at this point
Copy the full SHA 0d81bceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0aa1083 - Browse repository at this point
Copy the full SHA 0aa1083View commit details
Commits on Jun 2, 2021
-
`make_clean_doc` is not needed and was removed. `logsumexp` may be needed if I misunderstood the loss calculation, so I left it in for now with a note.
Configuration menu - View commit details
-
Copy full SHA for 4a4ef72 - Browse repository at this point
Copy the full SHA 4a4ef72View commit details
Commits on Jun 4, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 18444fc - Browse repository at this point
Copy the full SHA 18444fcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 67d9ebc - Browse repository at this point
Copy the full SHA 67d9ebcView commit details
Commits on Jun 12, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 7efbc72 - Browse repository at this point
Copy the full SHA 7efbc72View commit details -
Configuration menu - View commit details
-
Copy full SHA for e728b0e - Browse repository at this point
Copy the full SHA e728b0eView commit details -
At a few points in the code it's normal to get a "2d" array where each row is a single entry. Calling squeeze will make that a proper 1d array... unless it's just one entry, in which case it turns into a 0d scalar. That's not what we want; flatten() provides the desired behavior.
Configuration menu - View commit details
-
Copy full SHA for d71198e - Browse repository at this point
Copy the full SHA d71198eView commit details
Commits on Jun 13, 2021
-
Change topk to sort descending
Shouldn't change correctness but is a little clearer
Configuration menu - View commit details
-
Copy full SHA for 96be7e8 - Browse repository at this point
Copy the full SHA 96be7e8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8452d11 - Browse repository at this point
Copy the full SHA 8452d11View commit details
Commits on Jun 17, 2021
-
The call here was creating a float64 array, which was turning many downstream scores into float64s. Later on these values were assigned to a float32 array in backprop, and numerical underflow caused things to go to zero. That's almost certainly not the only reason things go to zero, but it is incorrect.
Configuration menu - View commit details
-
Copy full SHA for cb2364c - Browse repository at this point
Copy the full SHA cb2364cView commit details -
Configuration menu - View commit details
-
Copy full SHA for fce804a - Browse repository at this point
Copy the full SHA fce804aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 848fd10 - Browse repository at this point
Copy the full SHA 848fd10View commit details -
Configuration menu - View commit details
-
Copy full SHA for a62121e - Browse repository at this point
Copy the full SHA a62121eView commit details -
Configuration menu - View commit details
-
Copy full SHA for ccf5611 - Browse repository at this point
Copy the full SHA ccf5611View commit details -
I think this change is correct, but intuition doesn't really help here...
Configuration menu - View commit details
-
Copy full SHA for 5c98c4c - Browse repository at this point
Copy the full SHA 5c98c4cView commit details
Commits on Jun 28, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 2334485 - Browse repository at this point
Copy the full SHA 2334485View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4f377d8 - Browse repository at this point
Copy the full SHA 4f377d8View commit details -
Configuration menu - View commit details
-
Copy full SHA for b02df61 - Browse repository at this point
Copy the full SHA b02df61View commit details
Commits on Jul 3, 2021
-
This doesn't change the math but makes the transposes slightly easier to understand (maybe?).
Configuration menu - View commit details
-
Copy full SHA for 3f66e18 - Browse repository at this point
Copy the full SHA 3f66e18View commit details -
Configuration menu - View commit details
-
Copy full SHA for f2e0e9d - Browse repository at this point
Copy the full SHA f2e0e9dView commit details -
In practice this is only ever used with axis=1, so it wasn't causing issues, even though it was wrong.
Configuration menu - View commit details
-
Copy full SHA for d74fa82 - Browse repository at this point
Copy the full SHA d74fa82View commit details -
Comment wondered if there should be some subtraction to avoid double counting, but it probably doesn't matter because the diagonal is 0.
Configuration menu - View commit details
-
Copy full SHA for 865caed - Browse repository at this point
Copy the full SHA 865caedView commit details -
Minor fix in crossing spans code
I think this was technically incorrect but harmless. The reason the code here is different than the reference in coref-hoi is that the indices there are such that they get +1 at the end of processing, while the code here handles indices directly.
Configuration menu - View commit details
-
Copy full SHA for 251a5b4 - Browse repository at this point
Copy the full SHA 251a5b4View commit details -
On initialize, use just two samples
Coref docs are kind of long, and using 10 samples on a smallish GPU can cause OOMs.
Configuration menu - View commit details
-
Copy full SHA for 2d3c559 - Browse repository at this point
Copy the full SHA 2d3c559View commit details -
Tweak mention limit calculation
The calculation of this in the coref-hoi code is hard to follow. Based on comments and variable names it sounds like it's using the doc length, but it might actually be the number of mentions? Number of mentions should be much larger and seems more correct, but might want to revisit this.
Configuration menu - View commit details
-
Copy full SHA for 5db28ec - Browse repository at this point
Copy the full SHA 5db28ecView commit details
Commits on Jul 5, 2021
-
This rewrites the loss to not use the Thinc crossentropy code at all. The main difference here is that the negative predictions are being masked out (= marginalized over), but negative gradient is still being reflected. I'm still not sure this is exactly right but models seem to train reliably now.
Configuration menu - View commit details
-
Copy full SHA for 8f66176 - Browse repository at this point
Copy the full SHA 8f66176View commit details -
Not necessary for convergence, but in coref-hoi this seems to add a few f1 points. Note that there are two width-related features in coref-hoi. This is a "prior" that is added to mention scores. The other width related feature is appended to the span embedding representation for other layers to reference.
Configuration menu - View commit details
-
Copy full SHA for 13bef2d - Browse repository at this point
Copy the full SHA 13bef2dView commit details -
Improve take_vecs implementation
This pulls out references to needed bits so that other parts (the larger embeddings) can be freed before backprop.
Configuration menu - View commit details
-
Copy full SHA for eb5820b - Browse repository at this point
Copy the full SHA eb5820bView commit details
Commits on Jul 8, 2021
-
The tuplify code here was added to Thinc proper and that's been released, so no need to have it here any more.
Configuration menu - View commit details
-
Copy full SHA for d0b041a - Browse repository at this point
Copy the full SHA d0b041aView commit details
Commits on Jul 10, 2021
-
Use scatter_add to speed up span embed backprop
This was the slowest part of the code, and using scatter_add here probably reduces the runtime by 50%.
Configuration menu - View commit details
-
Copy full SHA for f34915c - Browse repository at this point
Copy the full SHA f34915cView commit details -
Configuration menu - View commit details
-
Copy full SHA for dc1f974 - Browse repository at this point
Copy the full SHA dc1f974View commit details -
This is now cleaner and significantly faster. There's still some messy parts in the code (particularly variable names), will get to that later.
Configuration menu - View commit details
-
Copy full SHA for d7d317a - Browse repository at this point
Copy the full SHA d7d317aView commit details -
Some of the lengths and backprop weren't right. Also various cleanup.
Configuration menu - View commit details
-
Copy full SHA for e00bd42 - Browse repository at this point
Copy the full SHA e00bd42View commit details -
Configuration menu - View commit details
-
Copy full SHA for c25ec29 - Browse repository at this point
Copy the full SHA c25ec29View commit details -
Configuration menu - View commit details
-
Copy full SHA for 447c707 - Browse repository at this point
Copy the full SHA 447c707View commit details
Commits on Jul 11, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 80a1707 - Browse repository at this point
Copy the full SHA 80a1707View commit details
Commits on Jul 14, 2021
-
There was an off-by-one error in how mentions are generated that would affect mentions at the end of a sentence. This was pretty nasty.
Configuration menu - View commit details
-
Copy full SHA for f1796e4 - Browse repository at this point
Copy the full SHA f1796e4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3684f7f - Browse repository at this point
Copy the full SHA 3684f7fView commit details -
Use relative indices for mentions
Was using batch absolute indices to manage mentions, but extract_spans expects doc-relative ones.
Configuration menu - View commit details
-
Copy full SHA for 4a9dc00 - Browse repository at this point
Copy the full SHA 4a9dc00View commit details -
This test was failing not because the thing it was testing wasn't working, but because of the way span equality works. Span equality relies on doc equality, and doc equality is object identity, so spans from different docs will never be equal.
Configuration menu - View commit details
-
Copy full SHA for e9626e3 - Browse repository at this point
Copy the full SHA e9626e3View commit details
Commits on Jul 15, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 9b63cbb - Browse repository at this point
Copy the full SHA 9b63cbbView commit details
Commits on Jul 18, 2021
-
Configuration menu - View commit details
-
Copy full SHA for a4531be - Browse repository at this point
Copy the full SHA a4531beView commit details -
This calculates scores as an average of three metrics. As noted in the code, these metrics all have issues, but we want to use them to match up with prior work. This should be replaced with some simpler default scoring and the scorer here should be moved to an external project to be passed in just for generating the traditional scores.
Configuration menu - View commit details
-
Copy full SHA for bc081c2 - Browse repository at this point
Copy the full SHA bc081c2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8bd0474 - Browse repository at this point
Copy the full SHA 8bd0474View commit details
Commits on Jul 19, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 3ed0fae - Browse repository at this point
Copy the full SHA 3ed0faeView commit details -
Configuration menu - View commit details
-
Copy full SHA for a151c62 - Browse repository at this point
Copy the full SHA a151c62View commit details
Commits on Jul 21, 2021
-
This continue should be a break. The current form doesn't cause errors but using a break will be a bit faster.
Configuration menu - View commit details
-
Copy full SHA for 1d1679d - Browse repository at this point
Copy the full SHA 1d1679dView commit details
Commits on Aug 8, 2021
-
Change mention limit to match reference implementations
This generall means fewer spans are considered, which makes individual steps in training faster but can make training take longer to find the good spans.
Configuration menu - View commit details
-
Copy full SHA for 56803d3 - Browse repository at this point
Copy the full SHA 56803d3View commit details
Commits on Aug 9, 2021
-
In the reference implementations, there's usually a function to build a ffnn of arbitrary depth, consisting of a stack of Linear >> Relu >> Dropout. In practice the depth is always 1 in coref-hoi, but in earlier iterations of the model, which are more similar to our model here (since we aren't using attention or even necessarily BERT), using a small depth like 2 was common. This hard-codes a stack of 2. In brief tests this allows similar performance to the unstacked version with much smaller embedding sizes. The depth of the stack could be made into a hyperparameter.
Configuration menu - View commit details
-
Copy full SHA for 00d481d - Browse repository at this point
Copy the full SHA 00d481dView commit details
Commits on Aug 12, 2021
-
Scoring code was just using one metric, not all three of interest.
Configuration menu - View commit details
-
Copy full SHA for 230698d - Browse repository at this point
Copy the full SHA 230698dView commit details
Commits on Feb 3, 2022
-
Merge branch 'master' into feature/coref
This brings coref up to date, in particular giving access to 3.2 features.
Configuration menu - View commit details
-
Copy full SHA for c7f586c - Browse repository at this point
Copy the full SHA c7f586cView commit details
Commits on Feb 7, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 0c15ab7 - Browse repository at this point
Copy the full SHA 0c15ab7View commit details
Commits on Mar 6, 2022
-
This absolutely does not work. First step here is getting over most of the code in roughly the files we want it in. After the code has been pulled over it can be restructured to match spaCy and cleaned up.
Configuration menu - View commit details
-
Copy full SHA for c0cd502 - Browse repository at this point
Copy the full SHA c0cd502View commit details
Commits on Mar 8, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 1c697b4 - Browse repository at this point
Copy the full SHA 1c697b4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 35cc2b1 - Browse repository at this point
Copy the full SHA 35cc2b1View commit details
Commits on Mar 9, 2022
-
The coref model is able to be loaded
The span predictor component is initialized but not used at all now. Plan is to work on it after the word level clustering part is trainable end-to-end.
Configuration menu - View commit details
-
Copy full SHA for c4f9c24 - Browse repository at this point
Copy the full SHA c4f9c24View commit details
Commits on Mar 14, 2022
-
Evaluate does not work - predict hasn't been updated
Configuration menu - View commit details
-
Copy full SHA for d22a002 - Browse repository at this point
Copy the full SHA d22a002View commit details -
Evaluation needs fixing, and code still needs cleanup.
Configuration menu - View commit details
-
Copy full SHA for 8eadf37 - Browse repository at this point
Copy the full SHA 8eadf37View commit details -
Configuration menu - View commit details
-
Copy full SHA for dfec699 - Browse repository at this point
Copy the full SHA dfec699View commit details -
Configuration menu - View commit details
-
Copy full SHA for e6917d8 - Browse repository at this point
Copy the full SHA e6917d8View commit details
Commits on Mar 15, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 0522a43 - Browse repository at this point
Copy the full SHA 0522a43View commit details -
This doesn't work as a component because it needs to modify gold data, so instead it's a conversion script (in another repo).
Configuration menu - View commit details
-
Copy full SHA for 17d017a - Browse repository at this point
Copy the full SHA 17d017aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 55039a6 - Browse repository at this point
Copy the full SHA 55039a6View commit details -
Moved everything into coref_util.py, deleted wl-specific file.
Configuration menu - View commit details
-
Copy full SHA for abdc7d8 - Browse repository at this point
Copy the full SHA abdc7d8View commit details -
Configuration menu - View commit details
-
Copy full SHA for d0ae259 - Browse repository at this point
Copy the full SHA d0ae259View commit details
Commits on Mar 16, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 5650853 - Browse repository at this point
Copy the full SHA 5650853View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7811a11 - Browse repository at this point
Copy the full SHA 7811a11View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6974f55 - Browse repository at this point
Copy the full SHA 6974f55View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0275ae2 - Browse repository at this point
Copy the full SHA 0275ae2View commit details -
Skeleton for span predictor component
This should be moved into its own file, but for now just stubbing out the methods.
Configuration menu - View commit details
-
Copy full SHA for 6855df0 - Browse repository at this point
Copy the full SHA 6855df0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a79d18 - Browse repository at this point
Copy the full SHA 1a79d18View commit details
Commits on Mar 18, 2022
-
The way fake batching works is that the pipeline component calls the model repeatedly in a loop internally. It feels like this should break something, but it worked in testing. Another issue is that this changes the signature of some of the pipeline functions, though I don't think that's an issue. Tested with batch size of 2, so more testing is needed, but this is a start.
Configuration menu - View commit details
-
Copy full SHA for a098849 - Browse repository at this point
Copy the full SHA a098849View commit details -
Kádár Ákos committed
Mar 18, 2022 Configuration menu - View commit details
-
Copy full SHA for db422ab - Browse repository at this point
Copy the full SHA db422abView commit details
Commits on Mar 19, 2022
-
Add progress on SpanPredictor component
This isn't working. There is a CUDA error in the torch code during initialization and it's not clear why.
Configuration menu - View commit details
-
Copy full SHA for 2190cbc - Browse repository at this point
Copy the full SHA 2190cbcView commit details
Commits on Mar 23, 2022
-
Configuration menu - View commit details
-
Copy full SHA for eec00ce - Browse repository at this point
Copy the full SHA eec00ceView commit details -
Kádár Ákos committed
Mar 23, 2022 Configuration menu - View commit details
-
Copy full SHA for 1eaf8fb - Browse repository at this point
Copy the full SHA 1eaf8fbView commit details -
Kádár Ákos committed
Mar 23, 2022 Configuration menu - View commit details
-
Copy full SHA for 150e7c4 - Browse repository at this point
Copy the full SHA 150e7c4View commit details
Commits on Mar 24, 2022
-
gearing up SpanPredictor for gold-heads
Kádár Ákos committedMar 24, 2022 Configuration menu - View commit details
-
Copy full SHA for 706b2e6 - Browse repository at this point
Copy the full SHA 706b2e6View commit details -
Kádár Ákos committed
Mar 24, 2022 Configuration menu - View commit details
-
Copy full SHA for a872c69 - Browse repository at this point
Copy the full SHA a872c69View commit details -
merge SpanPredictor attributes
Kádár Ákos committedMar 24, 2022 Configuration menu - View commit details
-
Copy full SHA for 1c5dabc - Browse repository at this point
Copy the full SHA 1c5dabcView commit details -
remove useless extra prefix and device from spanpredictor
Kádár Ákos committedMar 24, 2022 Configuration menu - View commit details
-
Copy full SHA for 83ac047 - Browse repository at this point
Copy the full SHA 83ac047View commit details
Commits on Mar 25, 2022
-
make sure predicted and reference keeps aligned
Kádár Ákos committedMar 25, 2022 Configuration menu - View commit details
-
Copy full SHA for 7304604 - Browse repository at this point
Copy the full SHA 7304604View commit details
Commits on Mar 28, 2022
-
Kádár Ákos committed
Mar 28, 2022 Configuration menu - View commit details
-
Copy full SHA for 4fc4034 - Browse repository at this point
Copy the full SHA 4fc4034View commit details -
Kádár Ákos committed
Mar 28, 2022 Configuration menu - View commit details
-
Copy full SHA for e4b4b67 - Browse repository at this point
Copy the full SHA e4b4b67View commit details -
addressing suggestions by @polm
Kádár Ákos committedMar 28, 2022 Configuration menu - View commit details
-
Copy full SHA for 06d680b - Browse repository at this point
Copy the full SHA 06d680bView commit details -
Kádár Ákos committed
Mar 28, 2022 Configuration menu - View commit details
-
Copy full SHA for 7ff99a3 - Browse repository at this point
Copy the full SHA 7ff99a3View commit details
Commits on Mar 30, 2022
-
Kádár Ákos committed
Mar 30, 2022 Configuration menu - View commit details
-
Copy full SHA for 63a41ba - Browse repository at this point
Copy the full SHA 63a41baView commit details
Commits on Apr 4, 2022
-
prepare for aligned heads-spans training
Kádár Ákos committedApr 4, 2022 Configuration menu - View commit details
-
Copy full SHA for a1d0219 - Browse repository at this point
Copy the full SHA a1d0219View commit details -
Kádár Ákos committed
Apr 4, 2022 Configuration menu - View commit details
-
Copy full SHA for ef141ad - Browse repository at this point
Copy the full SHA ef141adView commit details
Commits on Apr 7, 2022
-
update with eg.predited as other components
Kádár Ákos committedApr 7, 2022 Configuration menu - View commit details
-
Copy full SHA for 3ba9131 - Browse repository at this point
Copy the full SHA 3ba9131View commit details
Commits on Apr 8, 2022
-
add backprop callback to spanpredictor
Kádár Ákos committedApr 8, 2022 Configuration menu - View commit details
-
Copy full SHA for 2a1ad4c - Browse repository at this point
Copy the full SHA 2a1ad4cView commit details -
report start- and end-accuracies separately
Kádár Ákos committedApr 8, 2022 Configuration menu - View commit details
-
Copy full SHA for 7a239f2 - Browse repository at this point
Copy the full SHA 7a239f2View commit details
Commits on Apr 11, 2022
-
Kádár Ákos committed
Apr 11, 2022 Configuration menu - View commit details
-
Copy full SHA for 6aedd98 - Browse repository at this point
Copy the full SHA 6aedd98View commit details
Commits on Apr 13, 2022
-
Preparing span predictor for predicting from gold (#10547)
Note this is squashed because rebasing had conflicts. * remove unnecessary .device * span predictor debug start * gearing up SpanPredictor for gold-heads * merge SpanPredictor attributes * remove useless extra prefix and device from spanpredictor * make sure predicted and reference keeps aligned * handle empty head_ids * handle empty clusters * addressing suggestions by @polm * nicer restore * fix score overwriting bug * prepare for aligned heads-spans training * span accuracy score * update with eg.predited as other components * add backprop callback to spanpredictor * report start- and end-accuracies separately * fixing scorer Co-authored-by: Kádár Ákos <akos@onyx.uvt.nl>
Configuration menu - View commit details
-
Copy full SHA for b53113e - Browse repository at this point
Copy the full SHA b53113eView commit details -
It's not clear if this is technically correct or not but it won't run without it for me.
Configuration menu - View commit details
-
Copy full SHA for d470fa0 - Browse repository at this point
Copy the full SHA d470fa0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2300f4d - Browse repository at this point
Copy the full SHA 2300f4dView commit details -
Remove all coref scoring exept LEA
This is necessary because one of the three old methods relied on scipy for some complex problem solving. LEA is generally better for evaluations. The downside is that this means evaluations aren't comparable with many papers, but canonical scoring can be supported using external eval scripts or other methods.
Configuration menu - View commit details
-
Copy full SHA for e8af027 - Browse repository at this point
Copy the full SHA e8af027View commit details
Commits on Apr 14, 2022
-
This seems to match with the scorer expectations better
Configuration menu - View commit details
-
Copy full SHA for 8181d45 - Browse repository at this point
Copy the full SHA 8181d45View commit details -
The difference in environments was due to a change in Thinc, the code here is fine.
Configuration menu - View commit details
-
Copy full SHA for 08729e0 - Browse repository at this point
Copy the full SHA 08729e0View commit details -
This was mistaken, not sure why my score seemed to be off before.
Configuration menu - View commit details
-
Copy full SHA for afd255c - Browse repository at this point
Copy the full SHA afd255cView commit details
Commits on Apr 18, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 683f470 - Browse repository at this point
Copy the full SHA 683f470View commit details
Commits on May 9, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 6b51258 - Browse repository at this point
Copy the full SHA 6b51258View commit details
Commits on May 10, 2022
-
A few unresolved points: - SpanPredictor should probably get its own file - What's the right way to document MentionClusters?
Configuration menu - View commit details
-
Copy full SHA for 117a9ef - Browse repository at this point
Copy the full SHA 117a9efView commit details -
Split span predictor component into its own file
This runs. The imports in both of the split files could probably use a close check to remove extras.
Configuration menu - View commit details
-
Copy full SHA for f852c5c - Browse repository at this point
Copy the full SHA f852c5cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 41fc092 - Browse repository at this point
Copy the full SHA 41fc092View commit details -
Configuration menu - View commit details
-
Copy full SHA for 33f4f90 - Browse repository at this point
Copy the full SHA 33f4f90View commit details -
Configuration menu - View commit details
-
Copy full SHA for e512874 - Browse repository at this point
Copy the full SHA e512874View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7cf6bcc - Browse repository at this point
Copy the full SHA 7cf6bccView commit details
Commits on May 11, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 57165f9 - Browse repository at this point
Copy the full SHA 57165f9View commit details -
Configuration menu - View commit details
-
Copy full SHA for b7ac4b3 - Browse repository at this point
Copy the full SHA b7ac4b3View commit details
Commits on May 12, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 14eb20f - Browse repository at this point
Copy the full SHA 14eb20fView commit details
Commits on May 13, 2022
-
First draft for architecture docs
These parameters are probably going to be renamed / have defaults adjusted. Also Model types are off.
Configuration menu - View commit details
-
Copy full SHA for 6a8625e - Browse repository at this point
Copy the full SHA 6a8625eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 13481fb - Browse repository at this point
Copy the full SHA 13481fbView commit details
Commits on May 16, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 2e8f0e9 - Browse repository at this point
Copy the full SHA 2e8f0e9View commit details
Commits on May 17, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 403fb95 - Browse repository at this point
Copy the full SHA 403fb95View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1dc3894 - Browse repository at this point
Copy the full SHA 1dc3894View commit details
Commits on May 19, 2022
-
Configuration menu - View commit details
-
Copy full SHA for e38e84a - Browse repository at this point
Copy the full SHA e38e84aView commit details
Commits on May 24, 2022
-
Add guards around torch import
Torch is required for the coref/spanpred models but shouldn't be required for spaCy in general. The one tricky part of this is that one function in coref_util relied on torch, but that file was imported in several places. Since the function was only used in one place I moved it there.
Configuration menu - View commit details
-
Copy full SHA for 9da16df - Browse repository at this point
Copy the full SHA 9da16dfView commit details -
Configuration menu - View commit details
-
Copy full SHA for b1118ce - Browse repository at this point
Copy the full SHA b1118ceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5cbc9f4 - Browse repository at this point
Copy the full SHA 5cbc9f4View commit details -
Configuration menu - View commit details
-
Copy full SHA for c9233a5 - Browse repository at this point
Copy the full SHA c9233a5View commit details
Commits on May 25, 2022
-
Merge pull request #10844 from polm/feature/coref-torch-guard
Add guards around torch import for coref
Configuration menu - View commit details
-
Copy full SHA for 3807a1b - Browse repository at this point
Copy the full SHA 3807a1bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 303269c - Browse repository at this point
Copy the full SHA 303269cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6999436 - Browse repository at this point
Copy the full SHA 6999436View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6087da9 - Browse repository at this point
Copy the full SHA 6087da9View commit details -
Configuration menu - View commit details
-
Copy full SHA for e721c7b - Browse repository at this point
Copy the full SHA e721c7bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2a8efda - Browse repository at this point
Copy the full SHA 2a8efdaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 838f501 - Browse repository at this point
Copy the full SHA 838f501View commit details -
Configuration menu - View commit details
-
Copy full SHA for 015050f - Browse repository at this point
Copy the full SHA 015050fView commit details -
Configuration menu - View commit details
-
Copy full SHA for f75a528 - Browse repository at this point
Copy the full SHA f75a528View commit details -
Configuration menu - View commit details
-
Copy full SHA for b8bdf99 - Browse repository at this point
Copy the full SHA b8bdf99View commit details -
Merge branch 'feature/coref' of https://github.com/explosion/spacy in…
…to feature/coref
Configuration menu - View commit details
-
Copy full SHA for 3fee693 - Browse repository at this point
Copy the full SHA 3fee693View commit details -
Configuration menu - View commit details
-
Copy full SHA for cea40c9 - Browse repository at this point
Copy the full SHA cea40c9View commit details -
Configuration menu - View commit details
-
Copy full SHA for aa2eb27 - Browse repository at this point
Copy the full SHA aa2eb27View commit details
Commits on Jun 8, 2022
-
Fix coref size inference (#10916)
* Add explicit tok2vec_size parameter in clusterer * Add tok2vec size to span predictor config * Minor fixes
Configuration menu - View commit details
-
Copy full SHA for 196886b - Browse repository at this point
Copy the full SHA 196886bView commit details
Commits on Jun 22, 2022
-
Refactor Coval Scoring code (#10875)
* Move coref scoring code to scorer.py Includes some renames to make names less generic. * Refactor coval code to remove ternary expressions * Black formatting * Add header * Make scorers into registered scorers * Small test fixes * Skip coref tests when torch not present Coref can't be loaded without Torch, so nothing works. * Fix remaining type issues Some of this just involves ignoring types in thorny areas. Two main issues: 1. Some things have weird types due to indirection/ argskwargs 2. xp2torch return type seems to have changed at some point * Update spacy/scorer.py Co-authored-by: kadarakos <kadar.akos@gmail.com> * Small changes from review * Be specific about the ValueError * Type fix Co-authored-by: kadarakos <kadar.akos@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 16894e6 - Browse repository at this point
Copy the full SHA 16894e6View commit details
Commits on Jun 28, 2022
-
Initial test of mismatched tokenization
This runs, but the results are nonsense because the indices are off.
Configuration menu - View commit details
-
Copy full SHA for af6d5ae - Browse repository at this point
Copy the full SHA af6d5aeView commit details -
This changes the tok2vec size in coref to hardcoded 64 to get tests to run. This should be reverted and hopefully replaced with proper shape inference.
Configuration menu - View commit details
-
Copy full SHA for ef5762d - Browse repository at this point
Copy the full SHA ef5762dView commit details -
This may not be done yet, as the test is just for consistency, and not overfitting correctly yet.
Configuration menu - View commit details
-
Copy full SHA for d1ff933 - Browse repository at this point
Copy the full SHA d1ff933View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9f94538 - Browse repository at this point
Copy the full SHA 9f94538View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a78259 - Browse repository at this point
Copy the full SHA 1a78259View commit details
Commits on Jun 29, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 0076f0f - Browse repository at this point
Copy the full SHA 0076f0fView commit details -
Handle case with nothing to score in span predictor
This case was not handled correctly. It may be desirable to make changes in the coref component to make sure this doesn't happen, but the span predictor should also handle this kind of data intelligently internally. Note that something is still weird because the span predictor seems to not be learning.
Configuration menu - View commit details
-
Copy full SHA for dd812ca - Browse repository at this point
Copy the full SHA dd812caView commit details
Commits on Jul 1, 2022
-
Merge pull request #11043 from kadarakos/feature/coref
Merging master into Feature/coref
Configuration menu - View commit details
-
Copy full SHA for c59aeeb - Browse repository at this point
Copy the full SHA c59aeebView commit details -
Merge branch 'feature/coref' into fix/coref-alignment
Had to renumber error message.
Configuration menu - View commit details
-
Copy full SHA for 7972088 - Browse repository at this point
Copy the full SHA 7972088View commit details
Commits on Jul 3, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 5192ac1 - Browse repository at this point
Copy the full SHA 5192ac1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1dacecb - Browse repository at this point
Copy the full SHA 1dacecbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 201731d - Browse repository at this point
Copy the full SHA 201731dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a4dbb7 - Browse repository at this point
Copy the full SHA 1a4dbb7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 619b110 - Browse repository at this point
Copy the full SHA 619b110View commit details -
Add failing test with tokenization mismatch
This test only fails due to the explicity assert False at the moment, but the debug output shows that the learned spans are all off by one due to misalignment. So the code still needs fixing.
Configuration menu - View commit details
-
Copy full SHA for a46bc03 - Browse repository at this point
Copy the full SHA a46bc03View commit details -
Configuration menu - View commit details
-
Copy full SHA for fd574a8 - Browse repository at this point
Copy the full SHA fd574a8View commit details -
Configuration menu - View commit details
-
Copy full SHA for cf33b48 - Browse repository at this point
Copy the full SHA cf33b48View commit details -
I believe this resolves issues with tokenization mismatches.
Configuration menu - View commit details
-
Copy full SHA for b09bbc7 - Browse repository at this point
Copy the full SHA b09bbc7View commit details
Commits on Jul 4, 2022
-
Configuration menu - View commit details
-
Copy full SHA for c7f333d - Browse repository at this point
Copy the full SHA c7f333dView commit details -
Add tests to give up with whitespace differences
Docs in Examples are allowed to have arbitrarily different whitespace. Handling that properly would be nice but isn't required, but for now check for it and blow up.
Configuration menu - View commit details
-
Copy full SHA for 178feae - Browse repository at this point
Copy the full SHA 178feaeView commit details
Commits on Jul 6, 2022
-
Update spacy/ml/models/coref_util.py
Co-authored-by: kadarakos <kadar.akos@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 63e27b5 - Browse repository at this point
Copy the full SHA 63e27b5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8f598d7 - Browse repository at this point
Copy the full SHA 8f598d7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f5cf83 - Browse repository at this point
Copy the full SHA 6f5cf83View commit details -
Parameter names in architecture docs were not updated after parameters were renamed.
Configuration menu - View commit details
-
Copy full SHA for da9c379 - Browse repository at this point
Copy the full SHA da9c379View commit details -
Configuration menu - View commit details
-
Copy full SHA for c4de3e5 - Browse repository at this point
Copy the full SHA c4de3e5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5e40573 - Browse repository at this point
Copy the full SHA 5e40573View commit details -
Configuration menu - View commit details
-
Copy full SHA for ce49136 - Browse repository at this point
Copy the full SHA ce49136View commit details -
First take at dimension inference
This follows the pattern used in the Biaffine Parser, which uses an init function to get the size only after the tok2vec is available. This works at first, but serialization fails with an error.
Configuration menu - View commit details
-
Copy full SHA for ba1bf8a - Browse repository at this point
Copy the full SHA ba1bf8aView commit details -
Configuration menu - View commit details
-
Copy full SHA for bd17c38 - Browse repository at this point
Copy the full SHA bd17c38View commit details -
Configuration menu - View commit details
-
Copy full SHA for f67c173 - Browse repository at this point
Copy the full SHA f67c173View commit details -
Configuration menu - View commit details
-
Copy full SHA for b59b924 - Browse repository at this point
Copy the full SHA b59b924View commit details -
Configuration menu - View commit details
-
Copy full SHA for b0800ea - Browse repository at this point
Copy the full SHA b0800eaView commit details -
Configuration menu - View commit details
-
Copy full SHA for da81a90 - Browse repository at this point
Copy the full SHA da81a90View commit details
Commits on Jul 8, 2022
-
mypy now exits without an error, except for two apparently unrelated ones about setup.py.
Configuration menu - View commit details
-
Copy full SHA for 2eee0d2 - Browse repository at this point
Copy the full SHA 2eee0d2View commit details
Commits on Jul 11, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 2c2791d - Browse repository at this point
Copy the full SHA 2c2791dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1b3db14 - Browse repository at this point
Copy the full SHA 1b3db14View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6d9eafe - Browse repository at this point
Copy the full SHA 6d9eafeView commit details -
Merge pull request #11042 from polm/fix/coref-alignment
Fix tokenization mismatch handling in coref
Configuration menu - View commit details
-
Copy full SHA for 9cbb970 - Browse repository at this point
Copy the full SHA 9cbb970View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4d03239 - Browse repository at this point
Copy the full SHA 4d03239View commit details -
Configuration menu - View commit details
-
Copy full SHA for baeb35f - Browse repository at this point
Copy the full SHA baeb35fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5969634 - Browse repository at this point
Copy the full SHA 5969634View commit details -
Configuration menu - View commit details
-
Copy full SHA for f9c82e2 - Browse repository at this point
Copy the full SHA f9c82e2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7792229 - Browse repository at this point
Copy the full SHA 7792229View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0f3c456 - Browse repository at this point
Copy the full SHA 0f3c456View commit details
Commits on Jul 12, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 64a0bf4 - Browse repository at this point
Copy the full SHA 64a0bf4View commit details -
Make get_clusters_from_doc return spans in order
There's no guarantee about the order in which SpanGroup keys will come out, so access them in sorted order when doing comparisons.
Configuration menu - View commit details
-
Copy full SHA for 1baa334 - Browse repository at this point
Copy the full SHA 1baa334View commit details -
Remove config from coref tests
This was necessary when the tok2vec_size option was necessary.
Configuration menu - View commit details
-
Copy full SHA for 07e8556 - Browse repository at this point
Copy the full SHA 07e8556View commit details -
Merge pull request #11089 from polm/coref/dimension-inference
Dimension inference in Coref
Configuration menu - View commit details
-
Copy full SHA for 90973fa - Browse repository at this point
Copy the full SHA 90973faView commit details -
This was probably used in the prototyping stage, left as a reference, and then forgotten. Nothing uses it any more.
Configuration menu - View commit details
-
Copy full SHA for 2e9dadf - Browse repository at this point
Copy the full SHA 2e9dadfView commit details
Commits on Aug 4, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 3a7658e - Browse repository at this point
Copy the full SHA 3a7658eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 62ffddd - Browse repository at this point
Copy the full SHA 62ffdddView commit details