Overlapping errors cause bad suggestions #29

snomos · 2019-07-13T11:23:15Z

$ echo "Ii oktage dieđe gean lea ovddasvástadus ." | divvun-checker -a tools/grammarcheckers/se.zcheck | jq .
{
  "errs": [
    [
      "ovddasvástadus",
      25,
      39,
      "typo",
      "Ii leat sátnelisttus",
      [
        "ovddasfástádus",
        "ovddasvástádus"
      ],
      "Čállinmeattáhusat"
    ],
    [
      ".",
      40,
      41,
      "space-before-punct-mark",
      "Lea gaska \".\" ovddas",
      [
        "ovddasvástadus."
      ],
      "Sátnegaskameattáhusat"
    ]
  ],
  "text": "Ii oktage dieđe gean lea ovddasvástadus ."
}

The punctuation error contains the preceding work (uncorrected) as part of the cofrrection suggestion, while the spelling error corrects the same word independently. The end result - when running automically / unsupervised at least - is that the misspelled word gets duplicated. This makes automatized testing much harder.

The text was updated successfully, but these errors were encountered:

snomos · 2019-07-13T18:52:54Z

Actually, the errors in themselves are not overlapping, which is part of the problem: because the punctuation error is so short in terms of character length, we extend the error context to the preceding (or following word), to make the error visible in an interactive context (= blue underline in LO etc). In those contexts the whole string is replaced, including the preceding/following word, where as in the command line interface, the only string replaced is the actual error — but the replacement still contains the full context as given by the CG rules (=preceding/following word). This in practice leads to a duplication of the context word in question.

unhammer · 2019-08-06T08:09:17Z

From the json, it's obvious the indices are wrong in the second error (40–41, i.e. just one character, should be 25–41).

When I look at the grammar checker output, I see one oddity: There are two error tags on the same &LINK reading "ovddasvástádus" […] &LINK &space-before-punct-mark &typo – so when divvun-suggest is trying to find what reading to connect "." […] &space-before-punct-mark R:LEFT:6 to, it gets confused.

Full output from grammar checker:

$ echo "Ii oktage dieđe gean lea ovddasvástadus ." | $GTHOME/langs/sme/tools/grammarcheckers/modes/smegram8-gc.mode 
"<Ii>"
        "ii" <aux> V IV Neg Ind Sg3 <W:0.0> @+FAUXV #1->1
: 
"<oktage>"
        "okta" Pron Indef Sg Nom Foc/Neg-ge <W:0.0> @<SUBJ #2->2
        "okta" Pron Indef Sg Nom Foc/Pos-ge <W:0.0> @<SUBJ #2->2
: 
"<dieđe>"
        "diehtit" <mv> V <EX-Nom-Ani> <TH-Acc-Any><TH-Inf> <TH-Acc-Any><TH-PrfPrc> <TH-Acc-Any><TH-AktioEss> <TH-birra-Any> <TH-FS-Qpron> <TH-FS-Qst> <TH-ahte> <TH-Acc-Any> TV Imprt ConNeg <W:0.0> @-FMAINV #3->3
        "diehtit" <mv> V <EX-Nom-Ani> <TH-Acc-Any><TH-Inf> <TH-Acc-Any><TH-PrfPrc> <TH-Acc-Any><TH-AktioEss> <TH-birra-Any> <TH-FS-Qpron> <TH-FS-Qst> <TH-ahte> <TH-Acc-Any> TV Ind Prs ConNeg <W:0.0> @-FMAINV #3->3
: 
"<gean>"
        "gii" §TH Pron Sem/Hum Rel Sg Acc <W:0.0> @<OBJ #4->3
: 
"<lea>"
        "leat" <mv> §TH V <copula> <TH-Nom-Any> <mielde> <OR-Loc-HumGroup> <OR-eret-Plc> <dušše><TH-Inf> <árvvus> <LO-Loc-johtu><DE-Ill-Plc> <AT-Loc-Mat> <AT-Abe-Any> <AT-Nom-Any> <AT-Nom-Adj><EX-Ill-Ani> <PO-Loc-Hum> <PO-Gen-Hum> <MA-mielde-Any> <MA-Adv-Manner> <XT-Gen-Measr> <LO-maŋŋil-Time> <LO-Acc-Time> <LO-Loc-Time> <CO-Com-Ani> <ID-Nom-Any> <TH-Nom-Any><RO-Ess-Any><EX-Ill-Any> <EX-Ill-Ani><TH-Nom-Adj> <EX-Ill-Ani> <TH-Nom-Obj><RE-Ill-Ani> <LO-Loc-Any> <AktioEss> <BE-Ill-Ani><PU-Ess-Any> <RO-Ess-Any><PU-Ill-Act> <RO-Ess-Any> IV Ind Prs Sg3 <W:0.0> @FS-<ADVL #5->3
: 
"<ovddasvástadus>"
        "ovddasvástádus" Err/Orth-a-á N <BE-Ill-Any> Sem/Perc-emo Sg Nom <W:0.0> @<SUBJ &LINK &space-before-punct-mark &typo #6->6 ID:6
        "ovddasvástádus" N <BE-Ill-Any> Sem/Perc-emo Sg Nom <W:0.0> @<SUBJ &typo &SUGGEST #6->6 ID:6
: 
"<.>"
        "." CLB <W:0.0> <SpaceBeforePunctMark> &space-before-punct-mark #7->7 ID:7 R:LEFT:6
        "." CLB <W:0.0> <SpaceBeforePunctMark> "<ovddasvástadus.>" &space-before-punct-mark &SUGGESTWF #7->7 ID:7 R:LEFT:6
:\n

If I make them separate readings, so we have

"<ovddasvástadus>"
	"ovddasvástádus" Err/Orth-a-á N <BE-Ill-Any> Sem/Perc-emo Sg Nom <W:0.0> @<SUBJ &LINK &space-before-punct-mark #6->6 ID:6
	"ovddasvástádus" Err/Orth-a-á N <BE-Ill-Any> Sem/Perc-emo Sg Nom <W:0.0> @<SUBJ &typo #6->6 ID:6
	"ovddasvástádus" N <BE-Ill-Any> Sem/Perc-emo Sg Nom <W:0.0> @<SUBJ &typo &SUGGEST #6->6 ID:6

and send it all through divvun-suggest, both errors will cover the the same range:

{
  "errs": [
    [
      "ovddasvástadus .",
      25,
      41,
      "typo",
      "Ii leat sátnelisttus",
      [
        "ovddasfástádus .",
        "ovddasvástádus ."
      ],
      "Čállinmeattáhusat"
    ],
    [
      "ovddasvástadus .",
      25,
      41,
      "space-before-punct-mark",
      "Lea gaska \".\" ovddas",
      [
        "ovddasvástadus."
      ],
      "Sátnegaskameattáhusat"
    ]
  ],
  "text": "Ii oktage dieđe gean lea ovddasvástadus .\n"
}

cf. issue #29

cf. #29 (comment) and tests

unhammer · 2019-08-06T09:29:47Z

divvun-suggest expects at most one error tag (&typo) etc. per reading. I've changed this in d43a550 so it should now handle having several.

(In this case it's fine to have several error tags on one reading, it's just about stretching the underline, but IIRC there are cases where we still need to put error tags on separate readings in CG.)

unhammer · 2019-08-06T09:32:56Z

$ echo "Ii oktage dieđe gean lea ovddasvástadus ." | $GTHOME/langs/sme/tools/grammarcheckers/modes/smegram8-gc.mode |src/divvun-suggest -g $GTHOME/langs/sme/tools/grammarcheckers/generator-gramcheck-gt-norm.hfstol -m $GTHOME/langs/sme/tools/grammarcheckers/errors.xml -l se  -j|jq .
{
  "errs": [
    [
      "ovddasvástadus .",
      25,
      41,
      "typo",
      "Ii leat sátnelisttus",
      [
        "ovddasfástádus .",
        "ovddasvástádus ."
      ],
      "Čállinmeattáhusat"
    ],
    [
      "ovddasvástadus .",
      25,
      41,
      "space-before-punct-mark",
      "Lea gaska \".\" ovddas",
      [
        "ovddasvástadus."
      ],
      "Sátnegaskameattáhusat"
    ]
  ],
  "text": "Ii oktage dieđe gean lea ovddasvástadus .\n"
}

snomos added the bug label Jul 13, 2019

snomos assigned unhammer Jul 13, 2019

snomos changed the title ~~Overlapping errors causes bad suggestions~~ Overlapping errors cause bad suggestions Jul 13, 2019

snomos added this to the 0.3.5 milestone Jul 13, 2019

unhammer added a commit that referenced this issue Aug 6, 2019

wip: Allow multiple error tags per reading

02a2a2b

cf. issue #29

unhammer added a commit that referenced this issue Aug 6, 2019

tests for #29

150b4c3

unhammer added a commit that referenced this issue Aug 6, 2019

Allow multiple &error tags per reading

d43a550

cf. #29 (comment) and tests

unhammer closed this as completed Aug 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overlapping errors cause bad suggestions #29

Overlapping errors cause bad suggestions #29

snomos commented Jul 13, 2019

snomos commented Jul 13, 2019

unhammer commented Aug 6, 2019

unhammer commented Aug 6, 2019

unhammer commented Aug 6, 2019

Overlapping errors cause bad suggestions #29

Overlapping errors cause bad suggestions #29

Comments

snomos commented Jul 13, 2019

snomos commented Jul 13, 2019

unhammer commented Aug 6, 2019

unhammer commented Aug 6, 2019

unhammer commented Aug 6, 2019