Strange behavior on parse/lex errors #21

roman-spiridonov · 2017-10-27T21:27:08Z

I am new to jison (my first day of using it) so excuse my probable ignorance in advance. There are some issues after switching to your fork from classical jison which I'd like to report here and get some input.

Issues

1) No concise error messages?

-t flag generates too much extra output, while the classical jison reported nice concise error messages.
classic jison:

$ node cool.js bad.cl
Parse error on line 12:
... type identifier *)Class b inherits A {
----------------------^
Expecting 'CLASS', got 'TYPEID'
C:\code\sandbox\[Stanford] Compilers\cool-jison\cool.js:394
                    throw new Error(errStr || 'Parsing halted while starting to recover from another error.');

jison-gho: too long output (impractical other than deep debugging)

I could not find a similar mechanism for error messaging in your fork (or it does not work on my grammar for some reason). I'd like to have a kind of concise error messages that used to be in classical jison in my final compiler.

2) Parser hanging on some bad files

For some reason, the parser generated from your fork hangs on some bad inputs , while classic jison produced a parser that failed gracefully with error message.

classic jison:

$ node cool.js null_in_code.cl.cool
Parse error on line 1:
...haracter in code *)null character is he
----------------------^
Expecting 'CLASS', got 'OBJECTID'
Lexer error at line 1:
...character is here => <-)
-----------------------^
 Skipping token:
C:\code\sandbox\[Stanford] Compilers\cool-jison\cool.js:394
                    throw new Error(errStr || 'Parsing halted while starting to recover from another error.');

jison-gho: hangs
I can reproduce this on files null_in_code.cl.cool and bad.cl, for example (see below).

3) Error recovery

I cannot get error recovery to work for my grammar. I ported it from flex/bison and the error recovery mechanism worked there. I recognize not all features of flex/bison are (neither will be) supported, but maybe I am doing something intrinsically wrong.

In other words, I am getting a single error and parser exits.
Expected: parser skips to the next error by reducing error non-terminal.
Example from my grammar:

class_list
: class	';'		/* single class */
  { $$ = ["CLASS_LIST", {}, $1]; }
| class_list class ';'	/* several classes */
  { $$ = prependChild($1, $2);  }
| error ';'    /* error recovery: skip to next class */
;

Notice I am using the same grammar in both classic jison and your fork.
So there are some backward compatibility differences.

Steps to reproduce / grammar

You can get grammar and test files from here:
https://github.com/roman-spiridonov/sandbox/tree/master/%5BStanford%5D%20Compilers/cool-jison

npm install, then npm run jison or npm run jison-gho to build the corresponding parser cool.js.
Then execute commands from issues above.

The text was updated successfully, but these errors were encountered:

roman-spiridonov · 2017-10-28T14:46:30Z

I fixed some shift/reduce conflicts in the grammar but still get similar issues (Question: does Jison always favors shift over reduce, as Bison does? Original Bison grammar also contained the conflicts but worked just fine).

There is one notable difference:
bad.cl now does not hang as before, but produces crazy error report within the AST (I would assume AST is not built if input has issues). It is worthwhile looking at why bad.cl hanged in previous version of the grammar though (it may seem after update that parser's hanging only due to presence of \0 character which may seem like an edge case, but obviously that's not the only reason as bad.cl did not contain ones and still hanged).

I suspected this is due to the following.

4) Error recovery should not build into AST

I found a new issue related to Issue 3) - error recovery behaves inconsistently with Bison.
In Bison, you error non-terminals are reduced and if I do not specify action for them they are not appearing in AST (just skipped / dropped).
In Jison, it looks like some default action is executed on reducing of error non-terminals , which results in errors inside the AST.
I deduce it from two behaviors (both issues do not reproduce in bison nor in classic jison).

$ node cool.js firstclasserrored.cl
C:\code\sandbox\[Stanford] Compilers\cool-jison\cool.js:7067
        throw new ExceptionClass(str, hash);
        ^
TypeError: node.splice is not a function
...

Here is the AST with error messages in them:

$ node cool.js bad.cl
{ errStr: 'Parse error on line 15: \nClass b in...\n^\nExpecting end of input, "CLASS", "class", got unexpected "TYPEID"',
  exception: null,
  text: 'Class',
  value: 'Class',
  token: '"TYPEID"',
  token_id: 4,
...

GerHobbelt · 2017-10-28T21:00:39Z

Interesting...

Addressing a few bits off the top; I'll have to investigate and spend more time on this to see what's really happening:

No concise error messages?

Indeed the generated parser & lexer produce a more extensive error report to help diagnose the problem. I've found numerous times that those 'concise error messages' give me too little info (and sometimes very misleading info too) so I spent too much time diagnosing various troubles.

When you want to pare down the error output, you may choose to provide your own parseError() function, which works behind the scenes for both user action code yyerror() calls and internal parser errors. Ditto for the lexer: it comes with its own parseError unless overridden.

In case you're interested: a lot of info is collected in the 'hash' passed to parseError(): see the internal constructParseErrorInfo() and constructLexErrorInfo() implementations which are output in the generated parser/lexer.

Another way to improve error reporting in grammars is code yyerror() messages in your error recovery rules; see for examples of this the grammars for the jison lexer and parser itself:

packages/lex-parser/lex.y
packages/lex-parser/lex.l
packages/ebnf-parser/bnf.y
packages/ebnf-parser/bnf.l

Parser hanging on some bad files

Hmmmmm, that (= hanging in the generated run-time) SHOULD NOT happen. That's a bug, but what and where precisely I can't answer yet. To be investigated.

Note

What certainly WILL have an impact is that upon encountering non-LALR(1) grammars, jison-gho falls back to a 'partial LR(1)' approach in an attempt to produce an unambiguous grammar. A message indicating this is attempted should be printed when you run jison: something along these lines (copy/pasta from jison source):

                Jison.print('\n'
                    + '----------------------------------- NOTICE -------------------------------\n'
                    + 'Attempting to resolve the unresolved conflicts in partial LR mode...\n\n'
                    + 'When no conflicts are reported in the next round below, your grammar is\n'
                    + 'accepted as mixed LR/LALR and should work as expected.\n'
                    + '--------------------------------------------------------------------------\n\n');

Error recovery

Hm, interesting. That means bison (haven't used that one for ages, while my memory is not 100% accurate, so caveat emptor) doesn't inject default action code for error rules 🤔 🤔 ?

Ho, HALT! 💥

class_list
: class	';'		/* single class */
  { $$ = ["CLASS_LIST", {}, $1]; }
| class_list class ';'	/* several classes */
  { $$ = prependChild($1, $2);  }
| error ';'    /* error recovery: skip to next class */
;

Given that bit of grammar, I don't see how you can 'skip the error token' in the AST as the class_list rule does error recovery in the third alt, and then when you reduce another rule where this is one of the terms, class_list must be either UNDEFINED (due to the empty error rule action) or some default value injected by bison/jison as the absence of an action block will have the tool inject a default action $$ = $1;. At least those are the two 'intuitive' options you've got re error recovery and AST building in there, AFAICT.

Anyway, that subject is far more subtle and covers way more than a single paragraph of text can.

Anyway, this sounds like you MIGHT BE bitten by the different default action code injection logic of the three:

bison: $$ = $1 AFAIK. I know from the Olden Days it gets nasty pretty darn quickly when you want to track location info across rule reductions, e.g. have a sensible value for @A in this grammar snippet:

g: a   { /* dump value and location */ dump($a, @a); }
  ;
a: a b    /* [MARK1] */
  | b       /* [MARK2] */
  ;
b: LEXER_TOKEN    /* [MARK3] */
  ;

[MARK3]:

since there's no user action, we'ld expect a sensible default action.

bison does $$ = $1, but AFAIR not @$ = @1; alongside, which is a problem as that would be, ah, counter-intuitive. At least for me. But then again, I might be wrong and it turns out bison does inject @$ = @1; (after maybe some grammar / code usage analysis? This grammar does need the @n location info, but many grammars don't. Anyway, food for thought and further investigation what bison does exactly.

jison-gho is different from the others as it attempts to produce an intuitive and predictable grammar action at all times. This means that jison-gho analyzes your grammar and its action code chunks and observes that this particular sample grammar uses location tracking info (@a) hence decides to always ensure every reduction (rule action block) sets its @$ location value, either in the user-coded action for the rule, or a default action injected by jison-gho. The jison-gho 'code injections' always land before the user action, IFF any exists.

Thus we have three results, depending on the tool -- WARNING: I may very well be wrong about the details for bison and vanilla jison, the key take-away is to check each generated parser engine if you're in a hurry and can't wait for the jison documentation to arrive. <@GerHobbelt grabs a vodka bottle and gloats in a dark recess, instead of getting his *** in gear and write a jison-gho book at night>

bison (estimated/assumed - MUST CHECK):

b: LEXER_TOKEN    /* [MARK3] */
  { 
     $$ = $1;    // bison injects this
     @$ = @1;    // does bison also inject this? Doesn't say so in the manual!
  }
  ;

jison (estimated/assumed - MUST CHECK):

b: LEXER_TOKEN    /* [MARK3] */
  { 
     $$ = $1;    // jison ALWAYS does this BEFORE invoke the user action code during rule reduction.
     @$ = @1;    // ditto?  I haven't closely looked at vanilla for some time and now I don't know. :-S
  }
  ;

jison-gho:

b: LEXER_TOKEN    /* [MARK3] */
  { 
     $$ = $1;    // jison-gho injects this as first statement WHEN your grammar uses values. It DOES, as you use $a in this sample grammar.
     @$ = @1;    // ditto!! BUT: jison-gho only injects this when code analysis shows you are using it SOMEWHERE in your grammar. This grammar DOES.
  }
  ;

So there is already a bit of difference between the brethren here...

Anyway, let's assume you then fix the b production to transfer the $LEXER_TOKEN value and the @LEXER_TOKEN position info...

[MARK2]:

as with the b production, default action code should transport both value and position info as the a alternative production needs/expects it. In bison AFAIR this doesn't fly unless you code the location tracking statements explicitly, thus implicitly taking out the default action injection activity.

bison (estimated/assumed - MUST CHECK):

a: b    /* [MARK2] */
  { 
     $$ = $1;    // bison injects this
     @$ = @1;    // does bison also inject this? Doesn't say so in the manual!
  }
  ;

jison (estimated/assumed - MUST CHECK):

a: b    /* [MARK2] */
  { 
     $$ = $1;    // jison ALWAYS does this BEFORE invoke the user action code during rule reduction.
     @$ = @1;    // ditto?  I haven't closely looked at vanilla for some time and now I don't know. :-S
  }
  ;

jison-gho:

a: b    /* [MARK2] */
  { 
     $$ = $1;    // jison-gho injects this as first statement WHEN your grammar uses values. It DOES, as you use $a in this sample grammar.
     @$ = @1;    // ditto!! BUT: jison-gho only injects this when code analysis shows you are using it SOMEWHERE in your grammar. This grammar DOES.
  }
  ;

[MARK1]:

The fun here is that (IMO) the brute-force 'default action' of $$ = $1; as applied by bison/jison is pretty senseless as it IMPLICITLY and NOISELESSLY discards anything but the first term of a production!
Hence wouldn't it be saner to have default action of $$ = something($1, $2) in order to combine the terms in a default action when the grammar writer user hasn't provided anything explicit?

That is exactly what jison-gho does: it thus produces a very basic kind-of-AST: an array of the term values.

bison (estimated/assumed - MUST CHECK):

a: a b    /* [MARK1] */
  { 
     $$ = $1;    // bison injects this
     @$ = @1;    // does bison also inject this? Doesn't say so in the manual!
  }
  ;

jison (estimated/assumed - MUST CHECK):

a: a b    /* [MARK1] */
  { 
     $$ = $1;    // jison ALWAYS does this BEFORE invoke the user action code during rule reduction.
     @$ = @1;    // ditto?  I haven't closely looked at vanilla for some time and now I don't know. :-S
  }
  ;

jison-gho:

a: a b    /* [MARK1] */
  { 
     $$ = [$1, $2];    // jison-gho injects this as first statement WHEN your grammar uses values. It DOES, as you use $a in this sample grammar.
     // ^^^ jison-gho also observes that this production has 2 terms and generates default action code accordingly!
     @$ = yyparser.yyMergeLocationInfo(@1, @2);    // ditto!! BUT: jison-gho only injects this when code analysis shows you are using it SOMEWHERE in your grammar. This grammar DOES.
  }
  ;

Thus a grammar with no action code in jison-gho produces a minimal AST and has location tracking when you need it anywhere in the grammar, while the other two tools have several quirks that thwart this unless you make sure to fully implement every production action code.

A few more ceveats re default code action injection (part 1)

bison (estimated/assumed - MUST CHECK):

a: %epsilon
  { 
     $$ = $1;    // don't know if bison is stupid enough to do this: UNDEFINED BEHAVIOUR RESULTS
     @$ = @1;    // ditto.
  }
  ;

jison (estimated/assumed - MUST CHECK):

a: %epsilon
  { 
     $$ = $1;    // jison sure IS stupid enough to do this as it ALWAYS does this BEFORE invoke the user action code during rule reduction. Hence you get non-deterministic behaviour at parser run-time.
     @$ = @1;    // ditto.
  }
  ;

jison-gho:

a: %epsilon
  { 
     $$ = undefined;    // jison-gho detects that this is an empty production and acts accordingly.
     @$ = yyparser.yyMergeLocationInfo(NULL);    // ditto!! BUT: jison-gho only injects this when code analysis shows you are using it SOMEWHERE in your grammar. This grammar DOES.
  }
  ;

A few more ceveats re default code action injection (part 2)

The default action injection of the three tools gets even more interesting when a user action block is provided, which is either empty or doesn't always set $$ and/or @$ before completing, e.g.

a: TOKEN
  { 
     print('hello');
     // ^^^ this ensures bison doesn't inject any default action,
     // hence your `$$` coming out of this rule's reduction is
     // UNDETERMINED!
  }
  ;

jison (vanilla) doesn't suffer from that artifact as it ALWAYS executes the default $$ = $1 action, even when it's not needed / repeated by the subsequent execution of the user action code.

jison-gho analyzes the action code chunk and when it's NOT SATISFIED that the action code will actually SET $$ guaranteed, i.e. all execution paths through the action code block will set the $$ before completing, and the result of that will drive its decision to inject a default action before the user action, or not.

That's another difference in behaviour; I've noticed that quite a few folks, including myself, get bitten by the rather non-intuitive default action injection behaviour of bison/jison, hence the described behaviour for jison-gho is intentionally deviant. When RL doesn't adhere to this intent/description, then there's a bug in jison-gho.

Coming back to your issue, I think there MIGHT be some subtle interaction between the action code injection behaviour and your grammar that I/you didn't anticipate. To be investigated.

Error recovery should not build into AST

error tokens in bison / jison produce (AFAIR) a nondeterministic token value for the error token, or undefined.

jison-gho facilitates further error analysis in the 'backend', i.e. in the application which executes the parser and receives an AST or other output from the grammar actions: jison-gho produces a deterministic error token value, which is an object containing all error info which led up to this error token, INCLUDING the implicitly consumed tokens during parser error recovery: the other tools silently discard those lexed tokens, while jison-gho $error provides access to them via the error token value and a special reduction action (UNDER DEVELOPMENT; the core for this already exists in the parser kernel, but all code generator facilities have not been augmented yet to provide this feature)

As such, error recovery rule reduction behaviour is not 1:1 exchangeable with bison/jison:
if you throw away the $error value in your error action recovery code, you're fine, but otherwise the $$ value of your error recovery rule will get loaded with the full error recovery info as error token value.

To 'kill' this behaviour, you miht want to write your error recovery rule like this:

a: error ';'
  { 
     console.error("see the wonder: ", $error);
     // make sure jison-gho doesn't inject a default action for value assignment:
     $$ = undefined;
     // ditto for location? Careful there!  >:-]
     @$ = undefined;
  }
  ;

My ~~two cents~~ two dimes so far. Hope the amount and depth of detail doesn't overload. I will address this issue further as time allows, but recon with a wait in days or possibly even a few weeks.

I think you've hit a few tender spots at least, UNTIL I can ensure myself that the jison-gho behaviour is as designed across the board; at the same time it's very good to look into this as cross-tool compatibility is always an issue.

Thanks

👍 for reporting and doing so with the amount of detail: it helps as grammar writing is cool and easy until you hit the proverbial fan: then it gets pretty darn hairy in a flash.

roman-spiridonov · 2017-10-28T21:37:23Z

Thanks for elaborate response!

Indeed the generated parser & lexer produce a more extensive error report to help diagnose the problem.

Having more deep error messages is useful in debug mode indeed. However, I am thinking more about production compiler and I think it should also contain error messages (just like any compiler does, like "syntax error at ... : bla-bla" or "unexpected token at..."). In reality there are two cases 1) debugging your program (concise) vs. 2) debugging the grammar (full). Just thinking that supporting both options would be nice. Some cmd option (w/o need to mess around with grammar and/or generated parser).
Thanks for providing info on how to deal with it meanwhile.

PS. Can't wait for documentation to come some day! :)

GerHobbelt · 2017-10-28T23:09:40Z

The lockup you encountered is indeed a bug. 💥
The parser kernel is adjusted and should not exhibit this faulty behaviour any more in the next release.

@roman-spiridonov

… kernel edge case found by @roman-spiridonov ( #21 ) where an error recovery rule sits just above the `$accept` rule and the entire input has just been lexed, while an error recovery fails, thus causing a lock-up in the parser kernel where the lexer keeps producing EOF tokens and the `locateNearestErrorRecoveryRule()` cycles between 'shifting the error token' and '$accept' state phases.

@roman-spiridonov

…e FAILED the examples/error-handling-and-yyerrok-part1..5 recovery test examples. This is further work done on SHA-1: a6b91fd :: fix infinite loop at run-time for particular erroneous inputs: parser kernel edge case found by @roman-spiridonov ( #21 ) where an error recovery rule sits just above the `$accept` rule and the entire input has just been lexed, while an error recovery fails, thus causing a lock-up in the parser kernel where the lexer keeps producing EOF tokens and the `locateNearestErrorRecoveryRule()` cycles between 'shifting the error token' and '$accept' state phases.

@roman-spiridonov

…grammar don't lock up or fail prematurely during error recovery: it's a hairy balance... This fixes/tweaks commit SHA-1: bd77c14 :: fix: adjust the parser kernel error recovery code: the previous change FAILED the examples/error-handling-and-yyerrok-part1..5 recovery test examples. This is further work done on SHA-1: a6b91fd :: fix infinite loop at run-time for particular erroneous inputs: parser kernel edge case found by @roman-spiridonov ( #21 ) where an error recovery rule sits just above the `$accept` rule and the entire input has just been lexed, while an error recovery fails, thus causing a lock-up in the parser kernel where the lexer keeps producing EOF tokens and the `locateNearestErrorRecoveryRule()` cycles between 'shifting the error token' and '$accept' state phases.

GerHobbelt · 2017-10-29T02:06:19Z

just pushed release 0.6.1-208; please check it out. It should at least have removed some of the problems...

The hairiest part is the bad.cl file which has several errors close together; I have edited your cool.y grammar to employ yyerrok() in the error recovery rules (see also bison documentation), but that doesn't "solve everything" either: the last few errors get bunched up together into a single fatal failure, which is interesting behaviour in itself, but that's for another day.

roman-spiridonov · 2017-10-29T09:25:35Z

Confirmed that issue 2) is resolved. Other issues still actual.

For issue 1) something like adding a new -v ---verbose option that would turn on the current detailed stack reports, and without -v behavior close to classic jison/bison would do.

For issues 3-4), as you said I cannot get the generated parser to generate all the errors for bad.cl (i.e. only first is reported), so error recovery does not work for some reason.

Also a good file to look at is firstclasserrored.cl. It's pretty simple in that it contains two errors: in 1st class and in 3rd class.
I see that both errors are caught, but then something very different from bison/classic jison going on:

jison-gho fails because it expects my AST node to be a list, but due to $$ = $1 in error rule it is no longer a list. Note that $$ = undefined does not solve the issue (the $$ is still not a list).
OK, I can fix this by something like $$ = [ $1, {}, "error" ], but then I no longer get nice error messages (rf. issue 1) ). And it's very different from bison/jison (see below).
Here is what jison prints in the same case (note: error action for class_list: error ';' is empty).

$ node cool.js firstclasserrored.cl
Parse error on line 1:
class Foo inherits asdfjkasldfjdklaf;c
-------------------^
Expecting 'TYPEID', got 'OBJECTID'
Parse error on line 3:
...class Baz {b():Int{B};};
----------------------^
Expecting '{', 'OBJECTID', '(', 'IF', 'WHILE', 'LET', 'CASE', 'NEW', 'ISVOID', '~', 'NOT', 'INT_CONST', 'BOOL_CONST', 'STR_CONST', got 'TYPEID'
;

Can't bison/classic jison just discard the code matching error non-terminal from AST completely? I mean just throw them away as if they were not existing in the first place, once you captured the error messages. So whenever your parser faces a syntax error, continue discarding tokens until you can reduce the 'error' (i.e. discard until you face an ';'). Yes you may face ";" not at the end of the class causing another syntax error but then the parser just continues error recovery until it faces yet another ';'. The cascading error messages can be prevented by default (I think bison has a default option for that like which is set to 3, and if you execute yyerrok; it would not silence any cascading errors).

quote from manual (http://langevin.univ-tln.fr/cours/COMPIL/tps/bison.html#Error-Recovery): "To prevent an outpouring of error messages, the parser will output no error message for another syntax error that happens shortly after the first; only after three consecutive input tokens have been successfully shifted will error messages resume. ... You can make error messages resume immediately by using the macro yyerrok in an action."

GerHobbelt · 2018-06-17T12:26:15Z

To be further addressed once I've got 0.6.5 = babel-based action code parsing and validation working.

GerHobbelt · 2018-06-17T12:26:50Z

(BTW: Most of this is split off into other issues: see refs above.)

GerHobbelt added bug enhancement question labels Oct 29, 2017

GerHobbelt mentioned this issue Dec 26, 2017

part of #21: no concise error messages #34

Closed

GerHobbelt added this to the Glorious Future milestone Jun 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange behavior on parse/lex errors #21

Strange behavior on parse/lex errors #21

roman-spiridonov commented Oct 27, 2017 •

edited

Loading

roman-spiridonov commented Oct 28, 2017 •

edited

Loading

GerHobbelt commented Oct 28, 2017

roman-spiridonov commented Oct 28, 2017 •

edited

Loading

GerHobbelt commented Oct 28, 2017 •

edited

Loading

GerHobbelt commented Oct 29, 2017

roman-spiridonov commented Oct 29, 2017

GerHobbelt commented Jun 17, 2018

GerHobbelt commented Jun 17, 2018

Strange behavior on parse/lex errors #21

Strange behavior on parse/lex errors #21

Comments

roman-spiridonov commented Oct 27, 2017 • edited Loading

Issues

1) No concise error messages?

2) Parser hanging on some bad files

3) Error recovery

Steps to reproduce / grammar

roman-spiridonov commented Oct 28, 2017 • edited Loading

4) Error recovery should not build into AST

GerHobbelt commented Oct 28, 2017

Note

[MARK3]:

bison (estimated/assumed - MUST CHECK):

jison (estimated/assumed - MUST CHECK):

jison-gho:

[MARK2]:

bison (estimated/assumed - MUST CHECK):

jison (estimated/assumed - MUST CHECK):

jison-gho:

[MARK1]:

bison (estimated/assumed - MUST CHECK):

jison (estimated/assumed - MUST CHECK):

jison-gho:

A few more ceveats re default code action injection (part 1)

bison (estimated/assumed - MUST CHECK):

jison (estimated/assumed - MUST CHECK):

jison-gho:

A few more ceveats re default code action injection (part 2)

Thanks

roman-spiridonov commented Oct 28, 2017 • edited Loading

GerHobbelt commented Oct 28, 2017 • edited Loading

GerHobbelt commented Oct 29, 2017

roman-spiridonov commented Oct 29, 2017

GerHobbelt commented Jun 17, 2018

GerHobbelt commented Jun 17, 2018

roman-spiridonov commented Oct 27, 2017 •

edited

Loading

roman-spiridonov commented Oct 28, 2017 •

edited

Loading

roman-spiridonov commented Oct 28, 2017 •

edited

Loading

GerHobbelt commented Oct 28, 2017 •

edited

Loading