Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lazy, recursive sentence splitting #7

Closed
wants to merge 19 commits into from

Commits on Feb 24, 2023

  1. Configuration menu
    Copy the full SHA
    563edcf View commit details
    Browse the repository at this point in the history
  2. Support lazy, recursive sentence splitting

    We use sentence splitting in the biaffine parser to keep the O(n^2)
    biaffine attention model tractable. However, since the sentence splitter
    makes errors, the parser may not have the correct head available.
    
    This change adds another splitting strategy. The goal of this strategy
    is to use the highest-probability splits to partition a doc until each
    partition is smaller than or equal to a maximum length. This reduces
    the number of attachment errors as a result of incorrect sentence
    splits, while providing an upper bound on complexity.
    
    The algorithm works as follows:
    
    * If the length |d| > max_length:
      - Find the highest-probability split in d according to senter.
      - Split d into d_1 and d_2 using the highest probability split.
      - Recursively apply this algorithm to d_1 and d_2.
    * Otherwise: do nothing
    danieldk committed Feb 24, 2023
    Configuration menu
    Copy the full SHA
    e152e86 View commit details
    Browse the repository at this point in the history
  3. ArcPredicter: better back-off

    We use a back-off when the first token is the best splitting point, to
    avoid an infinite recursion. The back-off was simply to use the second
    token, refine this to choose the second-most probable splitting point.
    danieldk committed Feb 24, 2023
    Configuration menu
    Copy the full SHA
    d1ac35b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    65eb57c View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    d8ade4c View commit details
    Browse the repository at this point in the history
  6. ArcLabeler: simplify loop

    danieldk committed Feb 24, 2023
    Configuration menu
    Copy the full SHA
    54710b2 View commit details
    Browse the repository at this point in the history
  7. Remove biaffine parser scorer from setup.cfg

    We now use the spaCy parser scorer.
    danieldk committed Feb 24, 2023
    Configuration menu
    Copy the full SHA
    de6b600 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    d61951f View commit details
    Browse the repository at this point in the history
  9. PairwiseBilinearModel: correctly set device for auxiliary arrays

    Now that Thinc doesn't set the Tensor type globally anymore, we have to
    make sure that Tensors are placed on the correct device.
    danieldk committed Feb 24, 2023
    Configuration menu
    Copy the full SHA
    bb64977 View commit details
    Browse the repository at this point in the history
  10. Use activations stored by the senter pipe

    Before this change, we'd use the senter pipe directly. However, this did
    not work with the transformer model without modifications (because it
    clears tensors after backprop). By using the functionality proposed in
    
    explosion/spaCy#11002
    
    we can use the activations that are stored by the senter pipe in `Doc`.
    danieldk committed Feb 24, 2023
    Configuration menu
    Copy the full SHA
    56e63b1 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    85fae1f View commit details
    Browse the repository at this point in the history
  12. Typing fixes

    danieldk committed Feb 24, 2023
    Configuration menu
    Copy the full SHA
    93ca4c7 View commit details
    Browse the repository at this point in the history
  13. Remove bound_las/uas from the base config

    This measure was removed.
    danieldk committed Feb 24, 2023
    Configuration menu
    Copy the full SHA
    9a23bb4 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    46be511 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    c43f204 View commit details
    Browse the repository at this point in the history
  16. Example: add evaluate-dev target

    Also make evaluation targets depend on the corpus they use.
    danieldk committed Feb 24, 2023
    Configuration menu
    Copy the full SHA
    d2a0c4c View commit details
    Browse the repository at this point in the history
  17. Simplify split seach

    Suggested by @kadarakos
    danieldk committed Feb 24, 2023
    Configuration menu
    Copy the full SHA
    996df6c View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    7e8b69c View commit details
    Browse the repository at this point in the history

Commits on Feb 27, 2023

  1. Configuration menu
    Copy the full SHA
    47e1b21 View commit details
    Browse the repository at this point in the history