Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastNLPProcessor annotatation is not stable #402

Open
kwalcock opened this issue Jun 23, 2020 · 12 comments
Open

FastNLPProcessor annotatation is not stable #402

kwalcock opened this issue Jun 23, 2020 · 12 comments
Assignees

Comments

@kwalcock
Copy link
Member

See also clulab/eidos#861. A different answer is computed from the same document. This is now a unit test in https://github.com/clulab/processors/blob/kwalcock-envBug/corenlp/src/test/scala/org/clulab/processors/TestFastNLPProcessorEnv.scala. Moving it from Eidos to Processors gets me a little closer to the cause and helps with debugging.

@kwalcock kwalcock self-assigned this Jun 23, 2020
@MihaiSurdeanu
Copy link
Contributor

Can you please paste an example of differences here?

@kwalcock
Copy link
Member Author

First run Second run
O O
O O
O O
>=2.0 >=2.0
O O
O O
O O
O O
2012-09-25 2012-09-18
******18 ******18
O O
2012-09-19 2012-09-19
******19 ******19
2012-W37 2012-W37
2012-W37 2012-W37
O O
O O
3 3
O O
O O
O O
2012-09 2012-09
2012-09 2012-09
2012-09 2012-09
O O

@MihaiSurdeanu
Copy link
Contributor

This is a Stanford SUTime bug... Maybe it should be filed in the Stanford CoreNLP github?

@kwalcock
Copy link
Member Author

That was almost my conclusion. I'd like to make sure that we are not misusing SUTime by, for instance, not calling some reset method between documents. I haven't yet found the line that makes the change that needs to be undone, though. I'll check what remedies Stanford might offer.

@MihaiSurdeanu
Copy link
Contributor

Good point. Thanks!

@kwalcock
Copy link
Member Author

kwalcock commented Jul 2, 2020

The problem does seem to be with SUTime and I will file an issue there shortly. This here is for practice.

The rules for dealing with time are encoded in src/edu/stanford/nlp/time/rules/english.sutime.txt. The rules

  ENV.defaults["stage"] = 4
  ...
  {  pattern: ( [ { tag:/VBD/ } | /have/ ] []{0,2} [ $hasTemporal ] ),
     action: VTag( $0[-1].temporal.value, "resolveTo", RESOLVE_TO_PAST )
  }
  {  pattern: ( [ $hasTemporal ] []{0,2} [ { tag:/VBD/ } | /have/ ] ),
     action: VTag( $0[0].temporal.value, "resolveTo", RESOLVE_TO_PAST )
  }
  {  pattern: ( (/would/ | /could/ | /should/ | /will/ | /going/ /to/ | /'/ /ll/ | /'ll/ )
                []{0,2} [ $hasTemporal ]
              ),
     action: VTag( $0[-1].temporal.value, "resolveTo", RESOLVE_TO_FUTURE )
  }
  {  pattern: ( [ $hasTemporal ] []{0,2}
                (/would/ | /could/ | /should/ | /will/ | /going/ /to/ | /'/ /ll/ | /'ll/ ) ),
     action: VTag( $0[0].temporal.value, "resolveTo", RESOLVE_TO_FUTURE )
  }

arrange for a value tag (VTag) to be added to the environment. The tag's key is "resolveTo" and the value will depend on the matching pattern. This ends up happening in ValueFunctions.java where I can observe the change take place. The problem is that the environment influences other operations, the whole point of it, but that it cannot easily be reset. The first document is annotated without a resolveTo tag and SUTime acts in one way. The second document is annotated with a side effect of the resolveTo tag being added. The first document gets read again, but the side effect influences behavior and a different result gets produced. I see no support anywhere for restoring the environment to its initial condition between documents short of doing something like throwing everything away and starting with a new object, which would be very expensive on a per document basis.

Opinions to the contrary are very welcome.

@MihaiSurdeanu
Copy link
Contributor

Nice catch!
I think you're correct.

@kwalcock
Copy link
Member Author

kwalcock commented Jul 2, 2020

Other related code is GenericTimeExpressionPatterns.java.determineRelFlags:

  public int determineRelFlags(CoreMap annotation, TimeExpression te)
  {
    int flags = 0;
    boolean flagsSet = false;
    if (te.value.getTags() != null) {
      Value v = te.value.getTags().getTag("resolveTo");
      if (v != null && v.get() instanceof Number) {
        flags = ((Number) v.get()).intValue();
        flagsSet = true;
      }
    }
    if (!flagsSet) {
      if (te.getTemporal() instanceof SUTime.PartialTime) {
        flags = SUTime.RESOLVE_TO_CLOSEST;
      }
    }
    return flags;
  }
}

and SUTime.PartialTime.resolve(). I'll reference them as well.

@kwalcock
Copy link
Member Author

kwalcock commented Jul 6, 2020

Submitted as stanfordnlp/CoreNLP#1061...

@kwalcock
Copy link
Member Author

We just recently read 40,000 documents twice and the same phenomenon was observed. The reading changes.

@MihaiSurdeanu
Copy link
Contributor

do you have examples of what changes? The SUTime output?

@kwalcock
Copy link
Member Author

It's the exact same as before, which shouldn't be surprising. The week wanders around, and we have lots and lots of examples. It wasn't somehow a temporary problem. The repeatability consequence is troubling. Aside from results reported in papers, Eidos with a given version is supposed to report the same results downstream for the same document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants