-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange behavior on parse/lex errors #21
Comments
I fixed some shift/reduce conflicts in the grammar but still get similar issues (Question: does Jison always favors shift over reduce, as Bison does? Original Bison grammar also contained the conflicts but worked just fine). There is one notable difference: I suspected this is due to the following. 4) Error recovery should not build into ASTI found a new issue related to Issue 3) - error recovery behaves inconsistently with Bison. $ node cool.js firstclasserrored.cl
C:\code\sandbox\[Stanford] Compilers\cool-jison\cool.js:7067
throw new ExceptionClass(str, hash);
^
TypeError: node.splice is not a function
... Here is the AST with error messages in them: $ node cool.js bad.cl
{ errStr: 'Parse error on line 15: \nClass b in...\n^\nExpecting end of input, "CLASS", "class", got unexpected "TYPEID"',
exception: null,
text: 'Class',
value: 'Class',
token: '"TYPEID"',
token_id: 4,
... |
Interesting... Addressing a few bits off the top; I'll have to investigate and spend more time on this to see what's really happening:
Indeed the generated parser & lexer produce a more extensive error report to help diagnose the problem. I've found numerous times that those 'concise error messages' give me too little info (and sometimes very misleading info too) so I spent too much time diagnosing various troubles. When you want to pare down the error output, you may choose to provide your own In case you're interested: a lot of info is collected in the 'hash' passed to Another way to improve error reporting in grammars is code
Hmmmmm, that (= hanging in the generated run-time) SHOULD NOT happen. That's a bug, but what and where precisely I can't answer yet. To be investigated. NoteWhat certainly WILL have an impact is that upon encountering non-LALR(1) grammars, jison-gho falls back to a 'partial LR(1)' approach in an attempt to produce an unambiguous grammar. A message indicating this is attempted should be printed when you run jison: something along these lines (copy/pasta from jison source): Jison.print('\n'
+ '----------------------------------- NOTICE -------------------------------\n'
+ 'Attempting to resolve the unresolved conflicts in partial LR mode...\n\n'
+ 'When no conflicts are reported in the next round below, your grammar is\n'
+ 'accepted as mixed LR/LALR and should work as expected.\n'
+ '--------------------------------------------------------------------------\n\n');
Hm, interesting. That means bison (haven't used that one for ages, while my memory is not 100% accurate, so caveat emptor) doesn't inject default action code for error rules 🤔 🤔 ? Ho, HALT! 💥
Given that bit of grammar, I don't see how you can 'skip the error token' in the AST as the Anyway, that subject is far more subtle and covers way more than a single paragraph of text can. Anyway, this sounds like you MIGHT BE bitten by the different default action code injection logic of the three:
[MARK3]:since there's no user action, we'ld expect a sensible default action. bison does jison-gho is different from the others as it attempts to produce an intuitive and predictable grammar action at all times. This means that jison-gho analyzes your grammar and its action code chunks and observes that this particular sample grammar uses location tracking info ( Thus we have three results, depending on the tool -- WARNING: I may very well be wrong about the details for bison and vanilla jison, the key take-away is to check each generated parser engine if you're in a hurry and can't wait for the jison documentation to arrive. <@GerHobbelt grabs a vodka bottle and gloats in a dark recess, instead of getting his *** in gear and write a jison-gho book at night> bison (estimated/assumed - MUST CHECK):
jison (estimated/assumed - MUST CHECK):
jison-gho:
So there is already a bit of difference between the brethren here... Anyway, let's assume you then fix the [MARK2]:as with the bison (estimated/assumed - MUST CHECK):
jison (estimated/assumed - MUST CHECK):
jison-gho:
[MARK1]:The fun here is that (IMO) the brute-force 'default action' of That is exactly what jison-gho does: it thus produces a very basic kind-of-AST: an array of the term values. bison (estimated/assumed - MUST CHECK):
jison (estimated/assumed - MUST CHECK):
jison-gho:
Thus a grammar with no action code in jison-gho produces a minimal AST and has location tracking when you need it anywhere in the grammar, while the other two tools have several quirks that thwart this unless you make sure to fully implement every production action code. A few more ceveats re default code action injection (part 1)bison (estimated/assumed - MUST CHECK):
jison (estimated/assumed - MUST CHECK):
jison-gho:
A few more ceveats re default code action injection (part 2)The default action injection of the three tools gets even more interesting when a user action block is provided, which is either empty or doesn't always set
jison (vanilla) doesn't suffer from that artifact as it ALWAYS executes the default jison-gho analyzes the action code chunk and when it's NOT SATISFIED that the action code will actually SET That's another difference in behaviour; I've noticed that quite a few folks, including myself, get bitten by the rather non-intuitive default action injection behaviour of bison/jison, hence the described behaviour for jison-gho is intentionally deviant. When RL doesn't adhere to this intent/description, then there's a bug in jison-gho. Coming back to your issue, I think there MIGHT be some subtle interaction between the action code injection behaviour and your grammar that I/you didn't anticipate. To be investigated.
error tokens in bison / jison produce (AFAIR) a nondeterministic token value for the error token, or jison-gho facilitates further error analysis in the 'backend', i.e. in the application which executes the parser and receives an AST or other output from the grammar actions: jison-gho produces a deterministic error token value, which is an object containing all error info which led up to this As such, error recovery rule reduction behaviour is not 1:1 exchangeable with bison/jison: To 'kill' this behaviour, you miht want to write your error recovery rule like this:
My I think you've hit a few tender spots at least, UNTIL I can ensure myself that the jison-gho behaviour is as designed across the board; at the same time it's very good to look into this as cross-tool compatibility is always an issue. Thanks👍 for reporting and doing so with the amount of detail: it helps as grammar writing is cool and easy until you hit the proverbial fan: then it gets pretty darn hairy in a flash. |
Thanks for elaborate response!
Having more deep error messages is useful in debug mode indeed. However, I am thinking more about production compiler and I think it should also contain error messages (just like any compiler does, like "syntax error at ... : bla-bla" or "unexpected token at..."). In reality there are two cases 1) debugging your program (concise) vs. 2) debugging the grammar (full). Just thinking that supporting both options would be nice. Some cmd option (w/o need to mess around with grammar and/or generated parser). PS. Can't wait for documentation to come some day! :) |
The lockup you encountered is indeed a bug. 💥 |
… kernel edge case found by @roman-spiridonov ( #21 ) where an error recovery rule sits just above the `$accept` rule and the entire input has just been lexed, while an error recovery fails, thus causing a lock-up in the parser kernel where the lexer keeps producing EOF tokens and the `locateNearestErrorRecoveryRule()` cycles between 'shifting the error token' and '$accept' state phases.
…e FAILED the examples/error-handling-and-yyerrok-part1..5 recovery test examples. This is further work done on SHA-1: a6b91fd :: fix infinite loop at run-time for particular erroneous inputs: parser kernel edge case found by @roman-spiridonov ( #21 ) where an error recovery rule sits just above the `$accept` rule and the entire input has just been lexed, while an error recovery fails, thus causing a lock-up in the parser kernel where the lexer keeps producing EOF tokens and the `locateNearestErrorRecoveryRule()` cycles between 'shifting the error token' and '$accept' state phases.
…grammar don't lock up or fail prematurely during error recovery: it's a hairy balance... This fixes/tweaks commit SHA-1: bd77c14 :: fix: adjust the parser kernel error recovery code: the previous change FAILED the examples/error-handling-and-yyerrok-part1..5 recovery test examples. This is further work done on SHA-1: a6b91fd :: fix infinite loop at run-time for particular erroneous inputs: parser kernel edge case found by @roman-spiridonov ( #21 ) where an error recovery rule sits just above the `$accept` rule and the entire input has just been lexed, while an error recovery fails, thus causing a lock-up in the parser kernel where the lexer keeps producing EOF tokens and the `locateNearestErrorRecoveryRule()` cycles between 'shifting the error token' and '$accept' state phases.
just pushed release 0.6.1-208; please check it out. It should at least have removed some of the problems... The hairiest part is the |
Confirmed that issue 2) is resolved. Other issues still actual. For issue 1) something like adding a new For issues 3-4), as you said I cannot get the generated parser to generate all the errors for Also a good file to look at is
$ node cool.js firstclasserrored.cl
Parse error on line 1:
class Foo inherits asdfjkasldfjdklaf;c
-------------------^
Expecting 'TYPEID', got 'OBJECTID'
Parse error on line 3:
...class Baz {b():Int{B};};
----------------------^
Expecting '{', 'OBJECTID', '(', 'IF', 'WHILE', 'LET', 'CASE', 'NEW', 'ISVOID', '~', 'NOT', 'INT_CONST', 'BOOL_CONST', 'STR_CONST', got 'TYPEID'
; Can't bison/classic jison just discard the code matching error non-terminal from AST completely? I mean just throw them away as if they were not existing in the first place, once you captured the error messages. So whenever your parser faces a syntax error, continue discarding tokens until you can reduce the 'error' (i.e. discard until you face an ';'). Yes you may face ";" not at the end of the class causing another syntax error but then the parser just continues error recovery until it faces yet another ';'. The cascading error messages can be prevented by default (I think bison has a default option for that like which is set to 3, and if you execute yyerrok; it would not silence any cascading errors). quote from manual (http://langevin.univ-tln.fr/cours/COMPIL/tps/bison.html#Error-Recovery): "To prevent an outpouring of error messages, the parser will output no error message for another syntax error that happens shortly after the first; only after three consecutive input tokens have been successfully shifted will error messages resume. ... You can make error messages resume immediately by using the macro yyerrok in an action." |
To be further addressed once I've got 0.6.5 = babel-based action code parsing and validation working. |
(BTW: Most of this is split off into other issues: see refs above.) |
I am new to jison (my first day of using it) so excuse my probable ignorance in advance. There are some issues after switching to your fork from classical jison which I'd like to report here and get some input.
Issues
1) No concise error messages?
-t
flag generates too much extra output, while the classical jison reported nice concise error messages.classic jison:
jison-gho: too long output (impractical other than deep debugging)
I could not find a similar mechanism for error messaging in your fork (or it does not work on my grammar for some reason). I'd like to have a kind of concise error messages that used to be in classical jison in my final compiler.
2) Parser hanging on some bad files
For some reason, the parser generated from your fork hangs on some bad inputs , while classic jison produced a parser that failed gracefully with error message.
classic jison:
jison-gho: hangs
I can reproduce this on files
null_in_code.cl.cool
andbad.cl
, for example (see below).3) Error recovery
I cannot get error recovery to work for my grammar. I ported it from flex/bison and the error recovery mechanism worked there. I recognize not all features of flex/bison are (neither will be) supported, but maybe I am doing something intrinsically wrong.
In other words, I am getting a single error and parser exits.
Expected: parser skips to the next error by reducing
error
non-terminal.Example from my grammar:
Notice I am using the same grammar in both classic jison and your fork.
So there are some backward compatibility differences.
Steps to reproduce / grammar
You can get grammar and test files from here:
https://github.com/roman-spiridonov/sandbox/tree/master/%5BStanford%5D%20Compilers/cool-jison
npm install
, thennpm run jison
ornpm run jison-gho
to build the corresponding parsercool.js
.Then execute commands from issues above.
The text was updated successfully, but these errors were encountered: