Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of small docs in coref #28

Merged
merged 4 commits into from
Nov 2, 2022

Conversation

polm
Copy link
Contributor

@polm polm commented Oct 31, 2022

Docs with one or zero tokens fail in the coref component. This doesn't have a fix yet, just a failing test. (There is also a test for the span resolver, which does not fail.)

Docs with one or zero tokens fail in the coref component. This doesn't
have a fix yet, just a failing test. (There is also a test for the span
resolver, which does not fail.)
It might be better to include this optionally? On the other hand, since
it should just be ignored in training, having it always there is more
thorough.
There can be no coref prediction for docs with one token (or no tokens).
Attempting to treat docs like that normally causes a mess with size
inference, so instead they're skipped.

In training, this just involves skipping the docs in the update step.
This is simple due to the fake batching structure, since the batch
doesn't have to be maintained.

In inference, this just involves short-circuiting to an empty
prediction.
The retokenization test is hard-coded to the the training example
because it manually merges some tokens, to make sure that the prediction
and merge line up. It would probably be better to separate out the
training data from the general example here, but for now narrowing the
training data works.
@polm
Copy link
Contributor Author

polm commented Oct 31, 2022

This should be fixed now, but the tests are failing - seems to be in a non-coref test, so either a dependency was update or it's a fluke.

Separately, while this works, I think the tests need some cleanup, so I'll take care of that before bringing this out of draft.

@polm
Copy link
Contributor Author

polm commented Nov 1, 2022

It looks like the test failure was transient - it was in coref tests, but only in a part unaffected by this PR, and it went away on a re-run.

On looking at the test code, it seems like factoring data out would only make things messier at this point, so this should be ready as-is if there are no lurking issues.

@polm polm marked this pull request as ready for review November 1, 2022 05:50
@polm polm added the bug Something isn't working label Nov 1, 2022
@polm polm merged commit 98b00ea into explosion:master Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants