Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native coref component #7264

Closed
wants to merge 215 commits into from
Closed

Native coref component #7264

wants to merge 215 commits into from

Commits on Mar 3, 2021

  1. Native coref component (#7243)

    * initial coref_er pipe
    
    * matcher more flexible
    
    * base coref component without actual model
    
    * initial setup of coref_er.score
    
    * rename to include_label
    
    * preliminary score_clusters method
    
    * apply scoring in coref component
    
    * IO fix
    
    * return None loss for now
    
    * rename to CoreferenceResolver
    
    * some preliminary unit tests
    
    * use registry as callable
    svlandeg committed Mar 3, 2021
    Configuration menu
    Copy the full SHA
    e0c45c6 View commit details
    Browse the repository at this point in the history

Commits on May 15, 2021

  1. Configuration menu
    Copy the full SHA
    3608b7b View commit details
    Browse the repository at this point in the history
  2. Migrate coref code

    This includes the coref code that was being tested separately, modified
    to work in spaCy. It hasn't been tested yet and presumably still needs
    fixes.
    
    In particular, the evaluation code is currently omitted. It's unclear at
    the moment whether we want to use a complex scorer similar to the
    official one, or a simpler scorer using more modern evaluation methods.
    polm committed May 15, 2021
    Configuration menu
    Copy the full SHA
    7c42a8c View commit details
    Browse the repository at this point in the history

Commits on May 17, 2021

  1. Minor fixes

    polm committed May 17, 2021
    Configuration menu
    Copy the full SHA
    91b1114 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e303628 View commit details
    Browse the repository at this point in the history

Commits on May 18, 2021

  1. Configuration menu
    Copy the full SHA
    a33d294 View commit details
    Browse the repository at this point in the history
  2. Fiddle with get_mentions definition

    Ended up not making a difference, but oh well.
    polm committed May 18, 2021
    Configuration menu
    Copy the full SHA
    0517155 View commit details
    Browse the repository at this point in the history
  3. Add basic tuplify init

    polm committed May 18, 2021
    Configuration menu
    Copy the full SHA
    883c137 View commit details
    Browse the repository at this point in the history
  4. Make get_sentence_map work with init

    When sentences are not available, just treat the whole doc as one
    sentence. A reasonable general fallback, but important due to the init
    call, where upstream components aren't run.
    polm committed May 18, 2021
    Configuration menu
    Copy the full SHA
    a7d9c81 View commit details
    Browse the repository at this point in the history
  5. Deal with generators in tuplify

    polm committed May 18, 2021
    Configuration menu
    Copy the full SHA
    0620820 View commit details
    Browse the repository at this point in the history
  6. Fix pipeline intialize

    polm committed May 18, 2021
    Configuration menu
    Copy the full SHA
    2486b8a View commit details
    Browse the repository at this point in the history
  7. Fix backprop

    Training seems to actually run now!
    polm committed May 18, 2021
    Configuration menu
    Copy the full SHA
    d22acee View commit details
    Browse the repository at this point in the history

Commits on May 20, 2021

  1. Break pairwise operations into pseudolayers

    This makes their scope tighter and more contained, and has the nice side
    effect that fewer things need to be passed around for backprop.
    polm committed May 20, 2021
    Configuration menu
    Copy the full SHA
    fa92daf View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8c5df62 View commit details
    Browse the repository at this point in the history
  3. Catch a stray reference

    polm committed May 20, 2021
    Configuration menu
    Copy the full SHA
    ff3fed0 View commit details
    Browse the repository at this point in the history

Commits on May 21, 2021

  1. Fix loss

    The loss was being returned as a single element array, which caused
    training to die when it attempted to turn it into JSON.
    polm committed May 21, 2021
    Configuration menu
    Copy the full SHA
    e1b4a85 View commit details
    Browse the repository at this point in the history
  2. Add new coref scoring

    This is closer to the traditional evaluation method. That uses an
    average of three scores, this is just using the bcubed metric for now
    (nothing special about bcubed, just picked one).
    
    The scoring implementation comes from the coval project. It relies on
    scipy, which is one issue, and is rather involved, which is another.
    
    Besides being comparable with traditional evaluations, this scoring is
    relatively fast.
    polm committed May 21, 2021
    Configuration menu
    Copy the full SHA
    f6652c9 View commit details
    Browse the repository at this point in the history
  3. Remove coref_er.py

    The intent of this was that it would be a component pipeline that used
    entities as input, but that's now covered by the get_mentions function
    as a pipeline arg.
    polm committed May 21, 2021
    Configuration menu
    Copy the full SHA
    0942a0b View commit details
    Browse the repository at this point in the history

Commits on May 24, 2021

  1. Minor cleanup

    polm committed May 24, 2021
    Configuration menu
    Copy the full SHA
    d6fd5fe View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d6389b1 View commit details
    Browse the repository at this point in the history
  3. Remove references to coref_er

    polm committed May 24, 2021
    Configuration menu
    Copy the full SHA
    a484245 View commit details
    Browse the repository at this point in the history

Commits on May 27, 2021

  1. Configuration menu
    Copy the full SHA
    ba2e491 View commit details
    Browse the repository at this point in the history
  2. delete outdated tests

    svlandeg committed May 27, 2021
    Configuration menu
    Copy the full SHA
    2e3c0e2 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9100265 View commit details
    Browse the repository at this point in the history
  4. removing unused imports

    svlandeg committed May 27, 2021
    Configuration menu
    Copy the full SHA
    04b55bf View commit details
    Browse the repository at this point in the history
  5. fix types of fwd functions

    svlandeg committed May 27, 2021
    Configuration menu
    Copy the full SHA
    391b512 View commit details
    Browse the repository at this point in the history

Commits on May 28, 2021

  1. Configuration menu
    Copy the full SHA
    0f5c586 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0d81bce View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    0aa1083 View commit details
    Browse the repository at this point in the history

Commits on Jun 2, 2021

  1. Clean up unused functions

    `make_clean_doc` is not needed and was removed.
    
    `logsumexp` may be needed if I misunderstood the loss calculation, so I
    left it in for now with a note.
    polm committed Jun 2, 2021
    Configuration menu
    Copy the full SHA
    4a4ef72 View commit details
    Browse the repository at this point in the history

Commits on Jun 4, 2021

  1. Remove old comment

    polm committed Jun 4, 2021
    Configuration menu
    Copy the full SHA
    18444fc View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    67d9ebc View commit details
    Browse the repository at this point in the history

Commits on Jun 12, 2021

  1. Don't use is_sentenced

    polm committed Jun 12, 2021
    Configuration menu
    Copy the full SHA
    7efbc72 View commit details
    Browse the repository at this point in the history
  2. Silence warning

    polm committed Jun 12, 2021
    Configuration menu
    Copy the full SHA
    e728b0e View commit details
    Browse the repository at this point in the history
  3. Replace squeeze with flatten

    At a few points in the code it's normal to get a "2d" array where each
    row is a single entry. Calling squeeze will make that a proper 1d
    array... unless it's just one entry, in which case it turns into a 0d
    scalar. That's not what we want; flatten() provides the desired
    behavior.
    polm committed Jun 12, 2021
    Configuration menu
    Copy the full SHA
    d71198e View commit details
    Browse the repository at this point in the history

Commits on Jun 13, 2021

  1. Change topk to sort descending

    Shouldn't change correctness but is a little clearer
    polm committed Jun 13, 2021
    Configuration menu
    Copy the full SHA
    96be7e8 View commit details
    Browse the repository at this point in the history
  2. Fix typo, remove old comment

    polm committed Jun 13, 2021
    Configuration menu
    Copy the full SHA
    8452d11 View commit details
    Browse the repository at this point in the history

Commits on Jun 17, 2021

  1. Fix type of mask

    The call here was creating a float64 array, which was turning many
    downstream scores into float64s. Later on these values were assigned to
    a float32 array in backprop, and numerical underflow caused things to go
    to zero.
    
    That's almost certainly not the only reason things go to zero, but it is
    incorrect.
    polm committed Jun 17, 2021
    Configuration menu
    Copy the full SHA
    cb2364c View commit details
    Browse the repository at this point in the history
  2. Minor optimization

    polm committed Jun 17, 2021
    Configuration menu
    Copy the full SHA
    fce804a View commit details
    Browse the repository at this point in the history
  3. Small fix

    polm committed Jun 17, 2021
    Configuration menu
    Copy the full SHA
    848fd10 View commit details
    Browse the repository at this point in the history
  4. Expose more hyperparameters

    polm committed Jun 17, 2021
    Configuration menu
    Copy the full SHA
    a62121e View commit details
    Browse the repository at this point in the history
  5. Remove old comments

    polm committed Jun 17, 2021
    Configuration menu
    Copy the full SHA
    ccf5611 View commit details
    Browse the repository at this point in the history
  6. Probably fix pw prod backprop

    I think this change is correct, but intuition doesn't really help
    here...
    polm committed Jun 17, 2021
    Configuration menu
    Copy the full SHA
    5c98c4c View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2021

  1. Remove unused function

    polm committed Jun 28, 2021
    Configuration menu
    Copy the full SHA
    2334485 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4f377d8 View commit details
    Browse the repository at this point in the history
  3. Add test for crossing spans

    This should maybe go elsewhere?
    polm committed Jun 28, 2021
    Configuration menu
    Copy the full SHA
    b02df61 View commit details
    Browse the repository at this point in the history

Commits on Jul 3, 2021

  1. Clean up pw_prod loss

    This doesn't change the math but makes the transposes slightly easier to
    understand (maybe?).
    polm committed Jul 3, 2021
    Configuration menu
    Copy the full SHA
    3f66e18 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f2e0e9d View commit details
    Browse the repository at this point in the history
  3. Fix axis handling in topk

    In practice this is only ever used with axis=1, so it wasn't causing
    issues, even though it was wrong.
    polm committed Jul 3, 2021
    Configuration menu
    Copy the full SHA
    d74fa82 View commit details
    Browse the repository at this point in the history
  4. Remove XXX comment

    Comment wondered if there should be some subtraction to avoid double
    counting, but it probably doesn't matter because the diagonal is 0.
    polm committed Jul 3, 2021
    Configuration menu
    Copy the full SHA
    865caed View commit details
    Browse the repository at this point in the history
  5. Minor fix in crossing spans code

    I think this was technically incorrect but harmless. The reason the code
    here is different than the reference in coref-hoi is that the indices
    there are such that they get +1 at the end of processing, while the code
    here handles indices directly.
    polm committed Jul 3, 2021
    Configuration menu
    Copy the full SHA
    251a5b4 View commit details
    Browse the repository at this point in the history
  6. On initialize, use just two samples

    Coref docs are kind of long, and using 10 samples on a smallish GPU can
    cause OOMs.
    polm committed Jul 3, 2021
    Configuration menu
    Copy the full SHA
    2d3c559 View commit details
    Browse the repository at this point in the history
  7. Tweak mention limit calculation

    The calculation of this in the coref-hoi code is hard to follow. Based
    on comments and variable names it sounds like it's using the doc length,
    but it might actually be the number of mentions? Number of mentions
    should be much larger and seems more correct, but might want to revisit
    this.
    polm committed Jul 3, 2021
    Configuration menu
    Copy the full SHA
    5db28ec View commit details
    Browse the repository at this point in the history

Commits on Jul 5, 2021

  1. Fix loss?

    This rewrites the loss to not use the Thinc crossentropy code at all.
    The main difference here is that the negative predictions are being
    masked out (= marginalized over), but negative gradient is still being
    reflected.
    
    I'm still not sure this is exactly right but models seem to train
    reliably now.
    polm committed Jul 5, 2021
    Configuration menu
    Copy the full SHA
    8f66176 View commit details
    Browse the repository at this point in the history
  2. Add width prior feature

    Not necessary for convergence, but in coref-hoi this seems to add a few
    f1 points.
    
    Note that there are two width-related features in coref-hoi. This is a
    "prior" that is added to mention scores. The other width related feature
    is appended to the span embedding representation for other layers to
    reference.
    polm committed Jul 5, 2021
    Configuration menu
    Copy the full SHA
    13bef2d View commit details
    Browse the repository at this point in the history
  3. Improve take_vecs implementation

    This pulls out references to needed bits so that other parts (the larger
    embeddings) can be freed before backprop.
    polm committed Jul 5, 2021
    Configuration menu
    Copy the full SHA
    eb5820b View commit details
    Browse the repository at this point in the history

Commits on Jul 8, 2021

  1. Switch to using Thinc tuplify

    The tuplify code here was added to Thinc proper and that's been
    released, so no need to have it here any more.
    polm committed Jul 8, 2021
    Configuration menu
    Copy the full SHA
    d0b041a View commit details
    Browse the repository at this point in the history

Commits on Jul 10, 2021

  1. Use scatter_add to speed up span embed backprop

    This was the slowest part of the code, and using scatter_add here
    probably reduces the runtime by 50%.
    polm committed Jul 10, 2021
    Configuration menu
    Copy the full SHA
    f34915c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    dc1f974 View commit details
    Browse the repository at this point in the history
  3. Clean up span embedding code

    This is now cleaner and significantly faster. There's still some messy
    parts in the code (particularly variable names), will get to that later.
    polm committed Jul 10, 2021
    Configuration menu
    Copy the full SHA
    d7d317a View commit details
    Browse the repository at this point in the history
  4. Fix span embeds

    Some of the lengths and backprop weren't right.
    
    Also various cleanup.
    polm committed Jul 10, 2021
    Configuration menu
    Copy the full SHA
    e00bd42 View commit details
    Browse the repository at this point in the history
  5. Cleanup

    polm committed Jul 10, 2021
    Configuration menu
    Copy the full SHA
    c25ec29 View commit details
    Browse the repository at this point in the history
  6. Fix loss

    Accidentally deleted it
    polm committed Jul 10, 2021
    Configuration menu
    Copy the full SHA
    447c707 View commit details
    Browse the repository at this point in the history

Commits on Jul 11, 2021

  1. Remove unused code

    polm committed Jul 11, 2021
    Configuration menu
    Copy the full SHA
    80a1707 View commit details
    Browse the repository at this point in the history

Commits on Jul 14, 2021

  1. Fix mention list bug

    There was an off-by-one error in how mentions are generated that would
    affect mentions at the end of a sentence. This was pretty nasty.
    polm committed Jul 14, 2021
    Configuration menu
    Copy the full SHA
    f1796e4 View commit details
    Browse the repository at this point in the history
  2. Remove comment from fixed test

    polm committed Jul 14, 2021
    Configuration menu
    Copy the full SHA
    3684f7f View commit details
    Browse the repository at this point in the history
  3. Use relative indices for mentions

    Was using batch absolute indices to manage mentions, but extract_spans
    expects doc-relative ones.
    polm committed Jul 14, 2021
    Configuration menu
    Copy the full SHA
    4a9dc00 View commit details
    Browse the repository at this point in the history
  4. Fix serialization test

    This test was failing not because the thing it was testing wasn't
    working, but because of the way span equality works. Span equality
    relies on doc equality, and doc equality is object identity, so spans
    from different docs will never be equal.
    polm committed Jul 14, 2021
    Configuration menu
    Copy the full SHA
    e9626e3 View commit details
    Browse the repository at this point in the history

Commits on Jul 15, 2021

  1. Add extract spans import

    polm committed Jul 15, 2021
    Configuration menu
    Copy the full SHA
    9b63cbb View commit details
    Browse the repository at this point in the history

Commits on Jul 18, 2021

  1. Add simple mention test

    polm committed Jul 18, 2021
    Configuration menu
    Copy the full SHA
    a4531be View commit details
    Browse the repository at this point in the history
  2. Add full traditional scoring

    This calculates scores as an average of three metrics. As noted in the
    code, these metrics all have issues, but we want to use them to match up
    with prior work.
    
    This should be replaced with some simpler default scoring and the scorer
    here should be moved to an external project to be passed in just for
    generating the traditional scores.
    polm committed Jul 18, 2021
    Configuration menu
    Copy the full SHA
    bc081c2 View commit details
    Browse the repository at this point in the history
  3. Run black

    polm committed Jul 18, 2021
    Configuration menu
    Copy the full SHA
    8bd0474 View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2021

  1. Add multi-sentence mention test

    Also formatting.
    polm committed Jul 19, 2021
    Configuration menu
    Copy the full SHA
    3ed0fae View commit details
    Browse the repository at this point in the history
  2. Add sentence map test

    polm committed Jul 19, 2021
    Configuration menu
    Copy the full SHA
    a151c62 View commit details
    Browse the repository at this point in the history

Commits on Jul 21, 2021

  1. Minor speedup

    This continue should be a break. The current form doesn't cause errors
    but using a break will be a bit faster.
    polm committed Jul 21, 2021
    Configuration menu
    Copy the full SHA
    1d1679d View commit details
    Browse the repository at this point in the history

Commits on Aug 8, 2021

  1. Change mention limit to match reference implementations

    This generall means fewer spans are considered, which makes individual
    steps in training faster but can make training take longer to find the
    good spans.
    polm committed Aug 8, 2021
    Configuration menu
    Copy the full SHA
    56803d3 View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2021

  1. Stack the mention scorer

    In the reference implementations, there's usually a function to build a
    ffnn of arbitrary depth, consisting of a stack of Linear >> Relu >>
    Dropout. In practice the depth is always 1 in coref-hoi, but in earlier
    iterations of the model, which are more similar to our model here (since
    we aren't using attention or even necessarily BERT), using a small depth
    like 2 was common. This hard-codes a stack of 2.
    
    In brief tests this allows similar performance to the unstacked version
    with much smaller embedding sizes.
    
    The depth of the stack could be made into a hyperparameter.
    polm committed Aug 9, 2021
    Configuration menu
    Copy the full SHA
    00d481d View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2021

  1. Fix bug in scorer

    Scoring code was just using one metric, not all three of interest.
    polm committed Aug 12, 2021
    Configuration menu
    Copy the full SHA
    230698d View commit details
    Browse the repository at this point in the history

Commits on Feb 3, 2022

  1. Merge branch 'master' into feature/coref

    This brings coref up to date, in particular giving access to 3.2
    features.
    polm committed Feb 3, 2022
    Configuration menu
    Copy the full SHA
    c7f586c View commit details
    Browse the repository at this point in the history

Commits on Feb 7, 2022

  1. Configuration menu
    Copy the full SHA
    0c15ab7 View commit details
    Browse the repository at this point in the history

Commits on Mar 6, 2022

  1. Start bringin in wl-coref

    This absolutely does not work. First step here is getting over most of
    the code in roughly the files we want it in. After the code has been
    pulled over it can be restructured to match spaCy and cleaned up.
    polm committed Mar 6, 2022
    Configuration menu
    Copy the full SHA
    c0cd502 View commit details
    Browse the repository at this point in the history

Commits on Mar 8, 2022

  1. Remove references to config

    Replaced with model arguments
    polm committed Mar 8, 2022
    Configuration menu
    Copy the full SHA
    1c697b4 View commit details
    Browse the repository at this point in the history
  2. Add span predictor code

    Accidentally omitted before
    polm committed Mar 8, 2022
    Configuration menu
    Copy the full SHA
    35cc2b1 View commit details
    Browse the repository at this point in the history

Commits on Mar 9, 2022

  1. The coref model is able to be loaded

    The span predictor component is initialized but not used at all now.
    Plan is to work on it after the word level clustering part is trainable
    end-to-end.
    polm committed Mar 9, 2022
    Configuration menu
    Copy the full SHA
    c4f9c24 View commit details
    Browse the repository at this point in the history

Commits on Mar 14, 2022

  1. Forward/backward pass works

    Evaluate does not work - predict hasn't been updated
    polm committed Mar 14, 2022
    Configuration menu
    Copy the full SHA
    d22a002 View commit details
    Browse the repository at this point in the history
  2. Training runs now

    Evaluation needs fixing, and code still needs cleanup.
    polm committed Mar 14, 2022
    Configuration menu
    Copy the full SHA
    8eadf37 View commit details
    Browse the repository at this point in the history
  3. Training works now

    polm committed Mar 14, 2022
    Configuration menu
    Copy the full SHA
    dfec699 View commit details
    Browse the repository at this point in the history
  4. Add util functions for wl-coref

    polm committed Mar 14, 2022
    Configuration menu
    Copy the full SHA
    e6917d8 View commit details
    Browse the repository at this point in the history

Commits on Mar 15, 2022

  1. Make span2head component

    polm committed Mar 15, 2022
    Configuration menu
    Copy the full SHA
    0522a43 View commit details
    Browse the repository at this point in the history
  2. Remove span2head

    This doesn't work as a component because it needs to modify gold data,
    so instead it's a conversion script (in another repo).
    polm committed Mar 15, 2022
    Configuration menu
    Copy the full SHA
    17d017a View commit details
    Browse the repository at this point in the history
  3. Remove old default config

    polm committed Mar 15, 2022
    Configuration menu
    Copy the full SHA
    55039a6 View commit details
    Browse the repository at this point in the history
  4. Clean up util code

    Moved everything into coref_util.py, deleted wl-specific file.
    polm committed Mar 15, 2022
    Configuration menu
    Copy the full SHA
    abdc7d8 View commit details
    Browse the repository at this point in the history
  5. Delete all the coref-hoi code

    polm committed Mar 15, 2022
    Configuration menu
    Copy the full SHA
    d0ae259 View commit details
    Browse the repository at this point in the history

Commits on Mar 16, 2022

  1. Remove unused functions

    polm committed Mar 16, 2022
    Configuration menu
    Copy the full SHA
    5650853 View commit details
    Browse the repository at this point in the history
  2. Change architecture

    polm committed Mar 16, 2022
    Configuration menu
    Copy the full SHA
    7811a11 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6974f55 View commit details
    Browse the repository at this point in the history
  4. Remove stale comment

    polm committed Mar 16, 2022
    Configuration menu
    Copy the full SHA
    0275ae2 View commit details
    Browse the repository at this point in the history
  5. Skeleton for span predictor component

    This should be moved into its own file, but for now just stubbing out
    the methods.
    polm committed Mar 16, 2022
    Configuration menu
    Copy the full SHA
    6855df0 View commit details
    Browse the repository at this point in the history
  6. Formatting

    polm committed Mar 16, 2022
    Configuration menu
    Copy the full SHA
    1a79d18 View commit details
    Browse the repository at this point in the history

Commits on Mar 18, 2022

  1. Add fake batching

    The way fake batching works is that the pipeline component calls the
    model repeatedly in a loop internally. It feels like this should break
    something, but it worked in testing.
    
    Another issue is that this changes the signature of some of the pipeline
    functions, though I don't think that's an issue.
    
    Tested with batch size of 2, so more testing is needed, but this is a
    start.
    polm committed Mar 18, 2022
    Configuration menu
    Copy the full SHA
    a098849 View commit details
    Browse the repository at this point in the history
  2. remove unnecessary .device

    Kádár Ákos committed Mar 18, 2022
    Configuration menu
    Copy the full SHA
    db422ab View commit details
    Browse the repository at this point in the history

Commits on Mar 19, 2022

  1. Add progress on SpanPredictor component

    This isn't working. There is a CUDA error in the torch code during
    initialization and it's not clear why.
    polm committed Mar 19, 2022
    Configuration menu
    Copy the full SHA
    2190cbc View commit details
    Browse the repository at this point in the history

Commits on Mar 23, 2022

  1. Configuration menu
    Copy the full SHA
    eec00ce View commit details
    Browse the repository at this point in the history
  2. span predictor debug start

    Kádár Ákos committed Mar 23, 2022
    Configuration menu
    Copy the full SHA
    1eaf8fb View commit details
    Browse the repository at this point in the history
  3. conflict

    Kádár Ákos committed Mar 23, 2022
    Configuration menu
    Copy the full SHA
    150e7c4 View commit details
    Browse the repository at this point in the history

Commits on Mar 24, 2022

  1. gearing up SpanPredictor for gold-heads

    Kádár Ákos committed Mar 24, 2022
    Configuration menu
    Copy the full SHA
    706b2e6 View commit details
    Browse the repository at this point in the history
  2. merge

    Kádár Ákos committed Mar 24, 2022
    Configuration menu
    Copy the full SHA
    a872c69 View commit details
    Browse the repository at this point in the history
  3. merge SpanPredictor attributes

    Kádár Ákos committed Mar 24, 2022
    Configuration menu
    Copy the full SHA
    1c5dabc View commit details
    Browse the repository at this point in the history
  4. remove useless extra prefix and device from spanpredictor

    Kádár Ákos committed Mar 24, 2022
    Configuration menu
    Copy the full SHA
    83ac047 View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2022

  1. make sure predicted and reference keeps aligned

    Kádár Ákos committed Mar 25, 2022
    Configuration menu
    Copy the full SHA
    7304604 View commit details
    Browse the repository at this point in the history

Commits on Mar 28, 2022

  1. handle empty head_ids

    Kádár Ákos committed Mar 28, 2022
    Configuration menu
    Copy the full SHA
    4fc4034 View commit details
    Browse the repository at this point in the history
  2. handle empty clusters

    Kádár Ákos committed Mar 28, 2022
    Configuration menu
    Copy the full SHA
    e4b4b67 View commit details
    Browse the repository at this point in the history
  3. addressing suggestions by @polm

    Kádár Ákos committed Mar 28, 2022
    Configuration menu
    Copy the full SHA
    06d680b View commit details
    Browse the repository at this point in the history
  4. nicer restore

    Kádár Ákos committed Mar 28, 2022
    Configuration menu
    Copy the full SHA
    7ff99a3 View commit details
    Browse the repository at this point in the history

Commits on Mar 30, 2022

  1. fix score overwriting bug

    Kádár Ákos committed Mar 30, 2022
    Configuration menu
    Copy the full SHA
    63a41ba View commit details
    Browse the repository at this point in the history

Commits on Apr 4, 2022

  1. prepare for aligned heads-spans training

    Kádár Ákos committed Apr 4, 2022
    Configuration menu
    Copy the full SHA
    a1d0219 View commit details
    Browse the repository at this point in the history
  2. span accuracy score

    Kádár Ákos committed Apr 4, 2022
    Configuration menu
    Copy the full SHA
    ef141ad View commit details
    Browse the repository at this point in the history

Commits on Apr 7, 2022

  1. update with eg.predited as other components

    Kádár Ákos committed Apr 7, 2022
    Configuration menu
    Copy the full SHA
    3ba9131 View commit details
    Browse the repository at this point in the history

Commits on Apr 8, 2022

  1. add backprop callback to spanpredictor

    Kádár Ákos committed Apr 8, 2022
    Configuration menu
    Copy the full SHA
    2a1ad4c View commit details
    Browse the repository at this point in the history
  2. report start- and end-accuracies separately

    Kádár Ákos committed Apr 8, 2022
    Configuration menu
    Copy the full SHA
    7a239f2 View commit details
    Browse the repository at this point in the history

Commits on Apr 11, 2022

  1. fixing scorer

    Kádár Ákos committed Apr 11, 2022
    Configuration menu
    Copy the full SHA
    6aedd98 View commit details
    Browse the repository at this point in the history

Commits on Apr 13, 2022

  1. Preparing span predictor for predicting from gold (#10547)

    Note this is squashed because rebasing had conflicts.
    
    * remove unnecessary .device
    
    * span predictor debug start
    
    * gearing up SpanPredictor for gold-heads
    
    * merge SpanPredictor attributes
    
    * remove useless extra prefix and device from spanpredictor
    
    * make sure predicted and reference keeps aligned
    
    * handle empty head_ids
    
    * handle empty clusters
    
    * addressing suggestions by @polm
    
    * nicer restore
    
    * fix score overwriting bug
    
    * prepare for aligned heads-spans training
    
    * span accuracy score
    
    * update with eg.predited as other components
    
    * add backprop callback to spanpredictor
    
    * report start- and end-accuracies separately
    
    * fixing scorer
    
    Co-authored-by: Kádár Ákos <akos@onyx.uvt.nl>
    kadarakos and Kádár Ákos committed Apr 13, 2022
    Configuration menu
    Copy the full SHA
    b53113e View commit details
    Browse the repository at this point in the history
  2. Adjust end indices

    It's not clear if this is technically correct or not but it won't run
    without it for me.
    polm committed Apr 13, 2022
    Configuration menu
    Copy the full SHA
    d470fa0 View commit details
    Browse the repository at this point in the history
  3. Fix span score logging

    polm committed Apr 13, 2022
    Configuration menu
    Copy the full SHA
    2300f4d View commit details
    Browse the repository at this point in the history
  4. Remove all coref scoring exept LEA

    This is necessary because one of the three old methods relied on scipy
    for some complex problem solving. LEA is generally better for
    evaluations.
    
    The downside is that this means evaluations aren't comparable with many
    papers, but canonical scoring can be supported using external eval
    scripts or other methods.
    polm committed Apr 13, 2022
    Configuration menu
    Copy the full SHA
    e8af027 View commit details
    Browse the repository at this point in the history

Commits on Apr 14, 2022

  1. Multiply accuracy by 100

    This seems to match with the scorer expectations better
    polm committed Apr 14, 2022
    Configuration menu
    Copy the full SHA
    8181d45 View commit details
    Browse the repository at this point in the history
  2. Remove end adjustment

    The difference in environments was due to a change in Thinc, the code
    here is fine.
    polm committed Apr 14, 2022
    Configuration menu
    Copy the full SHA
    08729e0 View commit details
    Browse the repository at this point in the history
  3. Undo multiply by 100

    This was mistaken, not sure why my score seemed to be off before.
    polm committed Apr 14, 2022
    Configuration menu
    Copy the full SHA
    afd255c View commit details
    Browse the repository at this point in the history

Commits on Apr 18, 2022

  1. Configuration menu
    Copy the full SHA
    683f470 View commit details
    Browse the repository at this point in the history

Commits on May 9, 2022

  1. Configuration menu
    Copy the full SHA
    6b51258 View commit details
    Browse the repository at this point in the history

Commits on May 10, 2022

  1. Initial coref docs

    A few unresolved points:
    
    - SpanPredictor should probably get its own file
    - What's the right way to document MentionClusters?
    polm committed May 10, 2022
    Configuration menu
    Copy the full SHA
    117a9ef View commit details
    Browse the repository at this point in the history
  2. Split span predictor component into its own file

    This runs. The imports in both of the split files could probably use a
    close check to remove extras.
    polm committed May 10, 2022
    Configuration menu
    Copy the full SHA
    f852c5c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    41fc092 View commit details
    Browse the repository at this point in the history
  4. Formatting

    polm committed May 10, 2022
    Configuration menu
    Copy the full SHA
    33f4f90 View commit details
    Browse the repository at this point in the history
  5. small refactor and docs

    kadarakos committed May 10, 2022
    Configuration menu
    Copy the full SHA
    e512874 View commit details
    Browse the repository at this point in the history
  6. merge misery

    kadarakos committed May 10, 2022
    Configuration menu
    Copy the full SHA
    7cf6bcc View commit details
    Browse the repository at this point in the history

Commits on May 11, 2022

  1. Merge pull request #10782 from kadarakos/feature/coref

    Feature/coref
    polm committed May 11, 2022
    Configuration menu
    Copy the full SHA
    57165f9 View commit details
    Browse the repository at this point in the history
  2. fixing arguments

    kadarakos committed May 11, 2022
    Configuration menu
    Copy the full SHA
    b7ac4b3 View commit details
    Browse the repository at this point in the history

Commits on May 12, 2022

  1. Add span predictor docs

    polm committed May 12, 2022
    Configuration menu
    Copy the full SHA
    14eb20f View commit details
    Browse the repository at this point in the history

Commits on May 13, 2022

  1. First draft for architecture docs

    These parameters are probably going to be renamed / have defaults
    adjusted. Also Model types are off.
    polm committed May 13, 2022
    Configuration menu
    Copy the full SHA
    6a8625e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    13481fb View commit details
    Browse the repository at this point in the history

Commits on May 16, 2022

  1. Rename coref params

    polm committed May 16, 2022
    Configuration menu
    Copy the full SHA
    2e8f0e9 View commit details
    Browse the repository at this point in the history

Commits on May 17, 2022

  1. merge

    kadarakos committed May 17, 2022
    Configuration menu
    Copy the full SHA
    403fb95 View commit details
    Browse the repository at this point in the history
  2. new parameters

    kadarakos committed May 17, 2022
    Configuration menu
    Copy the full SHA
    1dc3894 View commit details
    Browse the repository at this point in the history

Commits on May 19, 2022

  1. Merge pull request #10812 from kadarakos/feature/coref

    Feature/coref
    polm committed May 19, 2022
    Configuration menu
    Copy the full SHA
    e38e84a View commit details
    Browse the repository at this point in the history

Commits on May 24, 2022

  1. Add guards around torch import

    Torch is required for the coref/spanpred models but shouldn't be
    required for spaCy in general.
    
    The one tricky part of this is that one function in coref_util relied on
    torch, but that file was imported in several places. Since the function
    was only used in one place I moved it there.
    polm committed May 24, 2022
    Configuration menu
    Copy the full SHA
    9da16df View commit details
    Browse the repository at this point in the history
  2. Move epsilon

    polm committed May 24, 2022
    Configuration menu
    Copy the full SHA
    b1118ce View commit details
    Browse the repository at this point in the history
  3. Use thinc.util.has_torch

    polm committed May 24, 2022
    Configuration menu
    Copy the full SHA
    5cbc9f4 View commit details
    Browse the repository at this point in the history
  4. Import torch from thinc

    polm committed May 24, 2022
    Configuration menu
    Copy the full SHA
    c9233a5 View commit details
    Browse the repository at this point in the history

Commits on May 25, 2022

  1. Merge pull request #10844 from polm/feature/coref-torch-guard

    Add guards around torch import for coref
    polm committed May 25, 2022
    Configuration menu
    Copy the full SHA
    3807a1b View commit details
    Browse the repository at this point in the history
  2. Skip coref test if no torch

    polm committed May 25, 2022
    Configuration menu
    Copy the full SHA
    303269c View commit details
    Browse the repository at this point in the history
  3. Fix coref tests

    polm committed May 25, 2022
    Configuration menu
    Copy the full SHA
    6999436 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    6087da9 View commit details
    Browse the repository at this point in the history
  5. Import cleanup

    polm committed May 25, 2022
    Configuration menu
    Copy the full SHA
    e721c7b View commit details
    Browse the repository at this point in the history
  6. Code review suggestions, cleanup

    polm committed May 25, 2022
    Configuration menu
    Copy the full SHA
    2a8efda View commit details
    Browse the repository at this point in the history
  7. Black formatting

    polm committed May 25, 2022
    Configuration menu
    Copy the full SHA
    838f501 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    015050f View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    f75a528 View commit details
    Browse the repository at this point in the history
  10. fix types in scorer + black

    svlandeg committed May 25, 2022
    Configuration menu
    Copy the full SHA
    b8bdf99 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    3fee693 View commit details
    Browse the repository at this point in the history
  12. fix types + black formatting

    svlandeg committed May 25, 2022
    Configuration menu
    Copy the full SHA
    cea40c9 View commit details
    Browse the repository at this point in the history
  13. small type fixes

    svlandeg committed May 25, 2022
    Configuration menu
    Copy the full SHA
    aa2eb27 View commit details
    Browse the repository at this point in the history

Commits on Jun 8, 2022

  1. Fix coref size inference (#10916)

    * Add explicit tok2vec_size parameter in clusterer
    
    * Add tok2vec size to span predictor config
    
    * Minor fixes
    polm committed Jun 8, 2022
    Configuration menu
    Copy the full SHA
    196886b View commit details
    Browse the repository at this point in the history

Commits on Jun 22, 2022

  1. Refactor Coval Scoring code (#10875)

    * Move coref scoring code to scorer.py
    
    Includes some renames to make names less generic.
    
    * Refactor coval code to remove ternary expressions
    
    * Black formatting
    
    * Add header
    
    * Make scorers into registered scorers
    
    * Small test fixes
    
    * Skip coref tests when torch not present
    
    Coref can't be loaded without Torch, so nothing works.
    
    * Fix remaining type issues
    
    Some of this just involves ignoring types in thorny areas. Two main
    issues:
    
    1. Some things have weird types due to indirection/ argskwargs
    2. xp2torch return type seems to have changed at some point
    
    * Update spacy/scorer.py
    
    Co-authored-by: kadarakos <kadar.akos@gmail.com>
    
    * Small changes from review
    
    * Be specific about the ValueError
    
    * Type fix
    
    Co-authored-by: kadarakos <kadar.akos@gmail.com>
    polm and kadarakos committed Jun 22, 2022
    Configuration menu
    Copy the full SHA
    16894e6 View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2022

  1. Initial test of mismatched tokenization

    This runs, but the results are nonsense because the indices are off.
    polm committed Jun 28, 2022
    Configuration menu
    Copy the full SHA
    af6d5ae View commit details
    Browse the repository at this point in the history
  2. Bad hack to get tests to run

    This changes the tok2vec size in coref to hardcoded 64 to get tests to
    run. This should be reverted and hopefully replaced with proper shape
    inference.
    polm committed Jun 28, 2022
    Configuration menu
    Copy the full SHA
    ef5762d View commit details
    Browse the repository at this point in the history
  3. Test works

    This may not be done yet, as the test is just for consistency, and not
    overfitting correctly yet.
    polm committed Jun 28, 2022
    Configuration menu
    Copy the full SHA
    d1ff933 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    9f94538 View commit details
    Browse the repository at this point in the history
  5. make sure same device

    kadarakos committed Jun 28, 2022
    Configuration menu
    Copy the full SHA
    1a78259 View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2022

  1. span predictor device fix

    kadarakos committed Jun 29, 2022
    Configuration menu
    Copy the full SHA
    0076f0f View commit details
    Browse the repository at this point in the history
  2. Handle case with nothing to score in span predictor

    This case was not handled correctly. It may be desirable to make changes
    in the coref component to make sure this doesn't happen, but the span
    predictor should also handle this kind of data intelligently internally.
    
    Note that something is still weird because the span predictor seems to
    not be learning.
    polm committed Jun 29, 2022
    Configuration menu
    Copy the full SHA
    dd812ca View commit details
    Browse the repository at this point in the history

Commits on Jul 1, 2022

  1. Merge pull request #11043 from kadarakos/feature/coref

    Merging master into Feature/coref
    polm committed Jul 1, 2022
    Configuration menu
    Copy the full SHA
    c59aeeb View commit details
    Browse the repository at this point in the history
  2. Merge branch 'feature/coref' into fix/coref-alignment

    Had to renumber error message.
    polm committed Jul 1, 2022
    Configuration menu
    Copy the full SHA
    7972088 View commit details
    Browse the repository at this point in the history

Commits on Jul 3, 2022

  1. Clean tests.

    polm committed Jul 3, 2022
    Configuration menu
    Copy the full SHA
    5192ac1 View commit details
    Browse the repository at this point in the history
  2. Run black

    polm committed Jul 3, 2022
    Configuration menu
    Copy the full SHA
    1dacecb View commit details
    Browse the repository at this point in the history
  3. Move spans2ints to util

    polm committed Jul 3, 2022
    Configuration menu
    Copy the full SHA
    201731d View commit details
    Browse the repository at this point in the history
  4. Add basic span predictor tests

    polm committed Jul 3, 2022
    Configuration menu
    Copy the full SHA
    1a4dbb7 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    619b110 View commit details
    Browse the repository at this point in the history
  6. Add failing test with tokenization mismatch

    This test only fails due to the explicity assert False at the moment,
    but the debug output shows that the learned spans are all off by one due
    to misalignment. So the code still needs fixing.
    polm committed Jul 3, 2022
    Configuration menu
    Copy the full SHA
    a46bc03 View commit details
    Browse the repository at this point in the history
  7. Update overfitting test

    polm committed Jul 3, 2022
    Configuration menu
    Copy the full SHA
    fd574a8 View commit details
    Browse the repository at this point in the history
  8. Update tests

    polm committed Jul 3, 2022
    Configuration menu
    Copy the full SHA
    cf33b48 View commit details
    Browse the repository at this point in the history
  9. Fix alignment issues

    I believe this resolves issues with tokenization mismatches.
    polm committed Jul 3, 2022
    Configuration menu
    Copy the full SHA
    b09bbc7 View commit details
    Browse the repository at this point in the history

Commits on Jul 4, 2022

  1. Configuration menu
    Copy the full SHA
    c7f333d View commit details
    Browse the repository at this point in the history
  2. Add tests to give up with whitespace differences

    Docs in Examples are allowed to have arbitrarily different whitespace.
    Handling that properly would be nice but isn't required, but for now
    check for it and blow up.
    polm committed Jul 4, 2022
    Configuration menu
    Copy the full SHA
    178feae View commit details
    Browse the repository at this point in the history

Commits on Jul 6, 2022

  1. Update spacy/ml/models/coref_util.py

    Co-authored-by: kadarakos <kadar.akos@gmail.com>
    polm and kadarakos committed Jul 6, 2022
    Configuration menu
    Copy the full SHA
    63e27b5 View commit details
    Browse the repository at this point in the history
  2. Feedback from code review

    polm committed Jul 6, 2022
    Configuration menu
    Copy the full SHA
    8f598d7 View commit details
    Browse the repository at this point in the history
  3. Remove _spans_to_offsets

    Basically the same as get_clusters_from_doc
    polm committed Jul 6, 2022
    Configuration menu
    Copy the full SHA
    6f5cf83 View commit details
    Browse the repository at this point in the history
  4. Update docs

    Parameter names in architecture docs were not updated after parameters
    were renamed.
    polm committed Jul 6, 2022
    Configuration menu
    Copy the full SHA
    da9c379 View commit details
    Browse the repository at this point in the history
  5. Remove old TODOs

    polm committed Jul 6, 2022
    Configuration menu
    Copy the full SHA
    c4de3e5 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    5e40573 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    ce49136 View commit details
    Browse the repository at this point in the history
  8. First take at dimension inference

    This follows the pattern used in the Biaffine Parser, which uses an init
    function to get the size only after the tok2vec is available.
    
    This works at first, but serialization fails with an error.
    polm committed Jul 6, 2022
    Configuration menu
    Copy the full SHA
    ba1bf8a View commit details
    Browse the repository at this point in the history
  9. It works!

    Was missing the serialization-related code from biaffine.
    polm committed Jul 6, 2022
    Configuration menu
    Copy the full SHA
    bd17c38 View commit details
    Browse the repository at this point in the history
  10. Remove tok2vec_size from coref

    polm committed Jul 6, 2022
    Configuration menu
    Copy the full SHA
    f67c173 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    b59b924 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    b0800ea View commit details
    Browse the repository at this point in the history
  13. Span predictor leftovers

    polm committed Jul 6, 2022
    Configuration menu
    Copy the full SHA
    da81a90 View commit details
    Browse the repository at this point in the history

Commits on Jul 8, 2022

  1. Fix types

    mypy now exits without an error, except for two apparently unrelated
    ones about setup.py.
    polm committed Jul 8, 2022
    Configuration menu
    Copy the full SHA
    2eee0d2 View commit details
    Browse the repository at this point in the history

Commits on Jul 11, 2022

  1. Merge pull request #11087 from polm/coref/doc-update

    Update Coref Docs
    polm committed Jul 11, 2022
    Configuration menu
    Copy the full SHA
    2c2791d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1b3db14 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6d9eafe View commit details
    Browse the repository at this point in the history
  4. Merge pull request #11042 from polm/fix/coref-alignment

    Fix tokenization mismatch handling in coref
    polm committed Jul 11, 2022
    Configuration menu
    Copy the full SHA
    9cbb970 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    4d03239 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    baeb35f View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    5969634 View commit details
    Browse the repository at this point in the history
  8. Update error number

    This was changed by merge
    polm committed Jul 11, 2022
    Configuration menu
    Copy the full SHA
    f9c82e2 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    7792229 View commit details
    Browse the repository at this point in the history
  10. Update error number

    This was changed by merge
    polm committed Jul 11, 2022
    Configuration menu
    Copy the full SHA
    0f3c456 View commit details
    Browse the repository at this point in the history

Commits on Jul 12, 2022

  1. Configuration menu
    Copy the full SHA
    64a0bf4 View commit details
    Browse the repository at this point in the history
  2. Make get_clusters_from_doc return spans in order

    There's no guarantee about the order in which SpanGroup keys will come
    out, so access them in sorted order when doing comparisons.
    polm committed Jul 12, 2022
    Configuration menu
    Copy the full SHA
    1baa334 View commit details
    Browse the repository at this point in the history
  3. Remove config from coref tests

    This was necessary when the tok2vec_size option was necessary.
    polm committed Jul 12, 2022
    Configuration menu
    Copy the full SHA
    07e8556 View commit details
    Browse the repository at this point in the history
  4. Merge pull request #11089 from polm/coref/dimension-inference

    Dimension inference in Coref
    polm committed Jul 12, 2022
    Configuration menu
    Copy the full SHA
    90973fa View commit details
    Browse the repository at this point in the history
  5. Remove orphaned function

    This was probably used in the prototyping stage, left as a reference,
    and then forgotten. Nothing uses it any more.
    polm committed Jul 12, 2022
    Configuration menu
    Copy the full SHA
    2e9dadf View commit details
    Browse the repository at this point in the history

Commits on Aug 4, 2022

  1. Configuration menu
    Copy the full SHA
    3a7658e View commit details
    Browse the repository at this point in the history
  2. Update architectures

    polm committed Aug 4, 2022
    Configuration menu
    Copy the full SHA
    62ffddd View commit details
    Browse the repository at this point in the history