Time value changes #861

kwalcock · 2020-05-18T17:42:08Z

I read the same set of documents on Thursday (May 14) and again on Monday (May 18). The times have changed in the output. For the sentence February 21, 2015 (ADDIS ABABA) - South Sudan peace talks aimed at ending the more than 14-month-long conflict in the young East African nation have been postponed until Monday. the first reading is

        "@type" : "Word",
        "@id" : "_:Word_36",
        "text" : "Monday",
        "tag" : "NNP",
        "entity" : "DATE",
        "startOffset" : 191,
        "endOffset" : 197,
        "lemma" : "Monday",
        "chunk" : "B-NP",
        "norm" : "2015-02-23"
      }, {

and the second reading is

        "@type" : "Word",
        "@id" : "_:Word_36",
        "text" : "Monday",
        "tag" : "NNP",
        "entity" : "DATE",
        "startOffset" : 191,
        "endOffset" : 197,
        "lemma" : "Monday",
        "chunk" : "B-NP",
        "norm" : "2015-02-16"
      }, {

The document does have a DCT:

    "dct" : {
      "@type" : "DCT",
      "@id" : "_:DCT_1",
      "text" : "2015-02-22",
      "start" : "2015-02-22T00:00",
      "end" : "2015-02-23T00:00"
    },

This is with useNeuralParser = false. I don't think that anything has been changed in the configuration. Any idea what might cause this?

The text was updated successfully, but these errors were encountered:

kwalcock · 2020-05-18T18:29:26Z

In case it helps, here is another example:

        "@type" : "Word",
        "@id" : "_:Word_65",
        "text" : "Tuesday",
        "tag" : "NNP",
        "entity" : "DATE",
        "startOffset" : 359,
        "endOffset" : 366,
        "lemma" : "Tuesday",
        "chunk" : "B-NP",
        "norm" : "2012-09-25"
      }, {

becomes

        "@type" : "Word",
        "@id" : "_:Word_65",
        "text" : "Tuesday",
        "tag" : "NNP",
        "entity" : "DATE",
        "startOffset" : 359,
        "endOffset" : 366,
        "lemma" : "Tuesday",
        "chunk" : "B-NP",
        "norm" : "2012-09-18"
      }, {

despite a DCT of

    "dct" : {
      "@type" : "DCT",
      "@id" : "_:DCT_1",
      "text" : "2012-09-22",
      "start" : "2012-09-22T00:00",
      "end" : "2012-09-23T00:00"
    },

The sentence is The flight, which was funded by a donation from the Netherlands, follows two other Dutch-funded charters on Tuesday 18 and Wednesday 19 last week carrying another 551 migrants.

kwalcock · 2020-05-18T18:34:58Z

        "@type" : "Word",
        "@id" : "_:Word_68",
        "text" : "Friday",
        "tag" : "NNP",
        "entity" : "DATE",
        "startOffset" : 363,
        "endOffset" : 369,
        "lemma" : "Friday",
        "chunk" : "B-NP",
        "norm" : "2010-07-16"
      }, {

becomes

        "@type" : "Word",
        "@id" : "_:Word_68",
        "text" : "Friday",
        "tag" : "NNP",
        "entity" : "DATE",
        "startOffset" : 363,
        "endOffset" : 369,
        "lemma" : "Friday",
        "chunk" : "B-NP",
        "norm" : "2010-07-23"
      }, {

for text According to a statement by the Administration for Refugees and Returnees Affairs (ARRA) seen by Sudan tribune, some 122 Eritrean refugees were flown to the United States on Friday to lead a new life in there after being exiled for years in different camps in the northern Ethiopia not far from the borders to Eritrea.
given DCT

    "dct" : {
      "@type" : "DCT",
      "@id" : "_:DCT_1",
      "text" : "2010-07-22",
      "start" : "2010-07-22T00:00",
      "end" : "2010-07-23T00:00"
    },

kwalcock · 2020-05-18T18:37:49Z

So it seems to always have to do with a day of the week and possible confusion about whether the previous value or the next value is chosen. I don't recall checking in any recent code changes in this area, especially not since last Thursday.

EgoLaparra · 2020-05-18T20:20:56Z

@kwalcock, I've moved the issue here since the problem is not caused by the neural parser.
I will do some tests, but, since norm values come from processors, this seems to be caused by SUTime.

MihaiSurdeanu · 2020-05-18T20:22:38Z

Fwiw, it seems to me that the second reading should be the correct one, since it references a time before publication.

I think this is related to the heuristic in SUTime that resolves days of the week such as "Monday". But I can't see why this would change, if we didn't change CoreNLP versions...
@BeckySharp : do you know?

kwalcock · 2020-05-19T02:24:10Z

Thanks for moving it to the right place. I'll try to see if it can be reproduced, perhaps on the same day, so that I'm absolutely certain that the code hasn't changed.

kwalcock · 2020-05-20T19:36:38Z

This phenomenon does appear to be repeatable. I'm trying to isolate the situation.

kwalcock · 2020-05-21T01:20:37Z

If Eidos reads, serially, the files 1742d787c22e9873c4bf9558e456ddd2, then 73f374515fed56aac5979d847591a7f8, and again 1742d787c22e9873c4bf9558e456ddd2, the two reads of the one file are different. Something must be keeping state around. The last time it happened, something from Stanford was running into an unknown word, noting it, and then not considering it unknown the next time around and working differently. I think that problem would happen when the same file was read twice in a row. That's not the case here.

1742d787c22e9873c4bf9558e456ddd2.json.txt
73f374515fed56aac5979d847591a7f8.json.txt

kwalcock · 2020-06-18T02:20:07Z

These texts are adequate:

The flight follows two other Dutch-funded charters on Tuesday 18 and Wednesday 19 last week. A third charter left Yemen earlier this month.

and

Libya has collapsed, a UNHCR spokeswoman said on Tuesday.

They do not need to go through Eidos. A pass through Processors is enough. Only these stages are necessary:

tagPartsOfSpeech(doc)
lemmatize(doc)
recognizeNamedEntities(doc)

kwalcock · 2020-06-18T15:15:04Z

It's looking like an edu.stanford.nlp.ling.tokensregex.Env is being maintained. This has a variable for TUESDAY which has a value which in turn has tags. There's a tag for resolveTo which is initially missing so that a default value of SUTime.RESOLVE_TO_CLOSEST is used. Sometime later in execution, that gets changed to RESOLVE_TO_PAST. It seems like it is getting incorporated into the environment and then not being reset/cleared properly.

kwalcock assigned EgoLaparra May 18, 2020

EgoLaparra transferred this issue from clulab/timenorm May 18, 2020

kwalcock mentioned this issue Jun 23, 2020

FastNLPProcessor annotatation is not stable clulab/processors#402

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time value changes #861

Time value changes #861

kwalcock commented May 18, 2020

kwalcock commented May 18, 2020

kwalcock commented May 18, 2020

kwalcock commented May 18, 2020

EgoLaparra commented May 18, 2020

MihaiSurdeanu commented May 18, 2020

kwalcock commented May 19, 2020

kwalcock commented May 20, 2020

kwalcock commented May 21, 2020

kwalcock commented Jun 18, 2020

kwalcock commented Jun 18, 2020

Time value changes #861

Time value changes #861

Comments

kwalcock commented May 18, 2020

kwalcock commented May 18, 2020

kwalcock commented May 18, 2020

kwalcock commented May 18, 2020

EgoLaparra commented May 18, 2020

MihaiSurdeanu commented May 18, 2020

kwalcock commented May 19, 2020

kwalcock commented May 20, 2020

kwalcock commented May 21, 2020

kwalcock commented Jun 18, 2020

kwalcock commented Jun 18, 2020