-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem when detecting multiple token #35
Comments
Given your example inputs, we're looking at a language which has "significant whitespace" as part of the language definition (such as python), which your lexer and/or parser does not seem to take into account given the error message: I don't see anything in the tokenlist there that hints you're also looking for a 'End Of Statement' token, i.e. a statement terminator. Then having a look at the precise grammar, it turns out there's at least one ambiguity in there: you accept an EMPTY statement while you also have
Next, and this is probably the culprit, is hinted at by the error message you report, when compared to the grammar: there's no PRINTLN token recognized by this grammar's SENTENCE rule at all?
Okay, there's other/more bits slightly wrong there:
which, after fixing, moves on to the next bit: the above-mentioned ambiguity and a few more:
which translates to your grammar not being LALR(1) nor LR(1) (which is the mode for the compile retry phase mentioned in the error report). Removing the ambiguous handling of the empty statement list as mentioned above removes one of the listed conflicts at least:
==>
Closer inspection of the grammar reveals the
as legal complete statements, while
as alt1 can parse all these variants:
while alt2 can parse this:
The LALR/LR compiler in jison will find this duplication and reports the conflict as a consequence. Removing alt2 from that rule removes the second conflict and produces a parser:
==>
Running the test code which I added (combined with that jison
==>
which shows the grammar needs a little cleanup in the token/lexer department to work; as I found it odd that you had these lexer rules:
I change them to the JavaScript language rules to disambiguate assignment and equality comparison operators (JavaScript:
while making sure I use the token name matching the one used in the grammar. Now if you correct the brackets in the Hence after this fix:
we need to remove the
one might be inclined to write:
but this will introduce another ambiguity as there's already
to match the first
to match both those statements. If you only wish to accept
then the PRINT rule should read:
|
The above is contained in the adjusted gist: https://gist.github.com/GerHobbelt/082eaff73fbb442abeb8a2ed7b033d97/revisions command line to compile:
when you want jison to produce a JS file with a parser which can be invoked from the (NodeJS) commandline like this:
|
Hello, your response has been very good, you have really helped me understand several elements of Jison. I will apply your recommendations to a University project called rp Sent from my Moto G(4) using FastHub |
Thank you for taking the time to analyze and answer my question. I'll be going over all the details you just showed me. |
Closing this issue then. Re-open if there's questions about this material; otherwise it's better to open another issue to keep them more or less 'atomic'. Cheers. |
I am doing a basic transpiler, it consists in translating certain syntax with already defined grammar. I still have some drawbacks when defining the detection of multiple tokens. That is, using * to specify the repetition of a regex token (I think the
*
is used to define the repetition of a token), for example (seeSENTENCE*
):Grammar works with the input:
but it does not work with the entry:
The error thrown is:
Could you tell me what I'm doing wrong?
See Gist complete code
The text was updated successfully, but these errors were encountered: