-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FastNLPProcessor annotatation is not stable #402
Comments
Can you please paste an example of differences here? |
|
This is a Stanford SUTime bug... Maybe it should be filed in the Stanford CoreNLP github? |
That was almost my conclusion. I'd like to make sure that we are not misusing SUTime by, for instance, not calling some reset method between documents. I haven't yet found the line that makes the change that needs to be undone, though. I'll check what remedies Stanford might offer. |
Good point. Thanks! |
The problem does seem to be with SUTime and I will file an issue there shortly. This here is for practice. The rules for dealing with time are encoded in
arrange for a value tag (VTag) to be added to the environment. The tag's key is "resolveTo" and the value will depend on the matching pattern. This ends up happening in ValueFunctions.java where I can observe the change take place. The problem is that the environment influences other operations, the whole point of it, but that it cannot easily be reset. The first document is annotated without a resolveTo tag and SUTime acts in one way. The second document is annotated with a side effect of the resolveTo tag being added. The first document gets read again, but the side effect influences behavior and a different result gets produced. I see no support anywhere for restoring the environment to its initial condition between documents short of doing something like throwing everything away and starting with a new object, which would be very expensive on a per document basis. Opinions to the contrary are very welcome. |
Nice catch! |
Other related code is GenericTimeExpressionPatterns.java.determineRelFlags: public int determineRelFlags(CoreMap annotation, TimeExpression te)
{
int flags = 0;
boolean flagsSet = false;
if (te.value.getTags() != null) {
Value v = te.value.getTags().getTag("resolveTo");
if (v != null && v.get() instanceof Number) {
flags = ((Number) v.get()).intValue();
flagsSet = true;
}
}
if (!flagsSet) {
if (te.getTemporal() instanceof SUTime.PartialTime) {
flags = SUTime.RESOLVE_TO_CLOSEST;
}
}
return flags;
}
} and SUTime.PartialTime.resolve(). I'll reference them as well. |
Submitted as stanfordnlp/CoreNLP#1061... |
We just recently read 40,000 documents twice and the same phenomenon was observed. The reading changes. |
do you have examples of what changes? The SUTime output? |
It's the exact same as before, which shouldn't be surprising. The week wanders around, and we have lots and lots of examples. It wasn't somehow a temporary problem. The repeatability consequence is troubling. Aside from results reported in papers, Eidos with a given version is supposed to report the same results downstream for the same document. |
See also clulab/eidos#861. A different answer is computed from the same document. This is now a unit test in https://github.com/clulab/processors/blob/kwalcock-envBug/corenlp/src/test/scala/org/clulab/processors/TestFastNLPProcessorEnv.scala. Moving it from Eidos to Processors gets me a little closer to the cause and helps with debugging.
The text was updated successfully, but these errors were encountered: