Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Lax parser #8

Open
wants to merge 2 commits into
base: antlr
Choose a base branch
from
Open

WIP: Lax parser #8

wants to merge 2 commits into from

Conversation

kasbah
Copy link
Member

@kasbah kasbah commented Jan 23, 2018

No description provided.

@kasbah kasbah changed the title Lax parser WIP: Lax parser Jan 23, 2018
@kasbah kasbah force-pushed the lax-parser branch 2 times, most recently from fb67c09 to edfa743 Compare January 23, 2018 20:00
@kasbah
Copy link
Member Author

kasbah commented Jan 23, 2018

So I found out I can create and UNKNOWN lexer rule that should match everything that hasn't been previously defined.

ignored: UNKNOWN* EOF;
 
UNKNOWN: .+?;

And that kind of works:

$ node bin/electro-grammar.js "akjdkadj 10 asdjkdj  ohm xaksjdkjd"
line 1:0 mismatched input 'a' expecting NUMBER
line 1:9 extraneous input '10' expecting {<EOF>, UNKNOWN}
line 1:21 extraneous input 'ohm' expecting {<EOF>, UNKNOWN}
{ component: { resistance: 10, type: 'resistor' },
  ignored: 'akjdkadj  asdjkdj   xaksjdkjd' }

I simplified the grammar to just work on resistors for the time being, trying to figure out why it ignores the k in 10k.

$ node bin/electro-grammar.js "10k xaksjdkjd"
line 1:2 mismatched input 'k' expecting {OHM, RPREFIX}
line 1:0 extraneous input '10' expecting {<EOF>, UNKNOWN}
{ component: { resistance: 10, type: 'resistor' },
  ignored: 'k xaksjdkjd' }

@kasbah
Copy link
Member Author

kasbah commented Jan 23, 2018

We probably should take advantage of mode(M):

mode (M)
After matching this token, switch the lexer to mode M . The next
time the lexer tries to match a token, it will look only at rules in mode M .
M can be a mode name from the same grammar or an integer literal. See
grammar Strings earlier.

@dvc94ch
Copy link
Collaborator

dvc94ch commented Jan 23, 2018

That looks like a lexer ambiguity. Try this on the java backend: echo "1k abc" | grun ElectroGrammar resistor -diagnostics -tree -tokens to find out what token k is (since it's not r-prefix).

@kasbah
Copy link
Member Author

kasbah commented Jan 23, 2018

Hmm, it seems to match everything to UNKNOWN so it looks like I was mistaken about how this would work. I wonder how come it kinda half works at all.

@kasbah
Copy link
Member Author

kasbah commented Jan 24, 2018

SO answers (1 and 2) seem to suggest that the last lexer rule will be the lowest priority. But that's not what I am seeing.

EDIT: Seems the imports re-order things or otherwise mess up the priority. :/ Tracking here: antlr/antlr4#2209

@dvc94ch dvc94ch closed this Jan 28, 2018
@kasbah kasbah reopened this Jan 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants