-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support lazy, recursive sentence splitting #7
Commits on Feb 24, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 563edcf - Browse repository at this point
Copy the full SHA 563edcfView commit details -
Support lazy, recursive sentence splitting
We use sentence splitting in the biaffine parser to keep the O(n^2) biaffine attention model tractable. However, since the sentence splitter makes errors, the parser may not have the correct head available. This change adds another splitting strategy. The goal of this strategy is to use the highest-probability splits to partition a doc until each partition is smaller than or equal to a maximum length. This reduces the number of attachment errors as a result of incorrect sentence splits, while providing an upper bound on complexity. The algorithm works as follows: * If the length |d| > max_length: - Find the highest-probability split in d according to senter. - Split d into d_1 and d_2 using the highest probability split. - Recursively apply this algorithm to d_1 and d_2. * Otherwise: do nothing
Configuration menu - View commit details
-
Copy full SHA for e152e86 - Browse repository at this point
Copy the full SHA e152e86View commit details -
We use a back-off when the first token is the best splitting point, to avoid an infinite recursion. The back-off was simply to use the second token, refine this to choose the second-most probable splitting point.
Configuration menu - View commit details
-
Copy full SHA for d1ac35b - Browse repository at this point
Copy the full SHA d1ac35bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 65eb57c - Browse repository at this point
Copy the full SHA 65eb57cView commit details -
Configuration menu - View commit details
-
Copy full SHA for d8ade4c - Browse repository at this point
Copy the full SHA d8ade4cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 54710b2 - Browse repository at this point
Copy the full SHA 54710b2View commit details -
Remove biaffine parser scorer from setup.cfg
We now use the spaCy parser scorer.
Configuration menu - View commit details
-
Copy full SHA for de6b600 - Browse repository at this point
Copy the full SHA de6b600View commit details -
Configuration menu - View commit details
-
Copy full SHA for d61951f - Browse repository at this point
Copy the full SHA d61951fView commit details -
PairwiseBilinearModel: correctly set device for auxiliary arrays
Now that Thinc doesn't set the Tensor type globally anymore, we have to make sure that Tensors are placed on the correct device.
Configuration menu - View commit details
-
Copy full SHA for bb64977 - Browse repository at this point
Copy the full SHA bb64977View commit details -
Use activations stored by the senter pipe
Before this change, we'd use the senter pipe directly. However, this did not work with the transformer model without modifications (because it clears tensors after backprop). By using the functionality proposed in explosion/spaCy#11002 we can use the activations that are stored by the senter pipe in `Doc`.
Configuration menu - View commit details
-
Copy full SHA for 56e63b1 - Browse repository at this point
Copy the full SHA 56e63b1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 85fae1f - Browse repository at this point
Copy the full SHA 85fae1fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 93ca4c7 - Browse repository at this point
Copy the full SHA 93ca4c7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9a23bb4 - Browse repository at this point
Copy the full SHA 9a23bb4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 46be511 - Browse repository at this point
Copy the full SHA 46be511View commit details -
Configuration menu - View commit details
-
Copy full SHA for c43f204 - Browse repository at this point
Copy the full SHA c43f204View commit details -
Example: add evaluate-dev target
Also make evaluation targets depend on the corpus they use.
Configuration menu - View commit details
-
Copy full SHA for d2a0c4c - Browse repository at this point
Copy the full SHA d2a0c4cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 996df6c - Browse repository at this point
Copy the full SHA 996df6cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7e8b69c - Browse repository at this point
Copy the full SHA 7e8b69cView commit details
Commits on Feb 27, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 47e1b21 - Browse repository at this point
Copy the full SHA 47e1b21View commit details