Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated file missing tokens #44

Open
kaby76 opened this issue Feb 20, 2022 · 1 comment
Open

Generated file missing tokens #44

kaby76 opened this issue Feb 20, 2022 · 1 comment

Comments

@kaby76
Copy link

kaby76 commented Feb 20, 2022

I am trying to test grammars-v4/verilog/verilog/ using Grammarinator. But, I'm getting problems in parsing some generated output. When I look at output from Trees.print(), the tree doesn't seem to contain all the tokens or sometimes more tokens that aren't in the printed tree.

Here is the code that I am executing:

git clone https://github.com/antlr/grammars-v4.git
cd grammars-v4
git checkout ffecfeee601ffc75edbc52845c1509753d6dd4a1
cd verilog/verilog
# Already cloned and build grammarinator from sources.
grammarinator-process VerilogLexer.g4 VerilogParser.g4 -o .
grammarinator-generate VerilogGenerator.VerilogGenerator  --sys-path . -d 15 -n 100 -r source_text --serializer grammarinator.runtime.simple_space_serializer --no-mutate --no-recombine
# Already built a standardized Antlr4 parser driver for the the grammar.
for  i in tests/test_*; do echo $i; ./Generated/bin/Debug/net5.0/Test.exe -file $i; status=$?; if [[ $status != 0 ]]; then break; fi; done

This loops through the various generated tests, parsing each, and stops the loop on a test file that does not parse.

I've assume that Grammarinator would construct a valid CST ("Unparser" tree) and output that. While most tests parse, some do not, and only appear when -d 15 is specified. I've included the --no-mutate and --no-recombine so that the tree is output as is unmodified.

To understand WHY the parse fails, I need to look at the CST constructed prior to serializing the token stream into a generated test. To do that, I modified generate.py after this line with this code:

    print("Index = ")
    print(index)
    tree.print()

I now rerun the grammarinator-generate command and save the human-readable parse trees, and rerun the parser.

Selecting a test that fails, I've noticed that the tree.print() output is not the same as the generated text, and the tokens reported by the standardized Antlr parser.

For example,

  • Output from tree.print():

    ...
    VERTICAL_BAR
    DOLLAR_RANDOM
    COMMA
    COMMA
    SIMPLE_IDENTIFIER
    ...

  • Tokens recognized by parser:

    ...
    VERTICAL_BAR
    DOLLAR_RANDOM
    COMMA
    SIMPLE_IDENTIFIER
    ...

(Note, only one COMMA.)

  • Relevant sequence in generated file:

    | $random , J

(Note, only one COMMA.)

I have noticed other times similar token differences. It seems that

Grammarinator indicates some tokens in the CST that are not being outputted.

Incidentally, I tried to just save the tree using --keep-trees but there is no tool to print out the trees after reading. I tried something like this, but it did not work.

from pydoc import importfile
module = importfile('/full/path/to/trees.py')
module.Trees.print(module.Trees.load("/full/path/to/test_xxx.grt"))
@akosthekiss
Copy link
Collaborator

@kaby76 Some quick comments:

  • I haven't seen anything like missing tokens from the test case before. With simple_space_serializer, the only way not to output a token is when it is empty (i.e., node.src is falsy, see https://github.com/renatahodovan/grammarinator/blob/master/grammarinator/runtime/serializer.py#L21). But a COMMA in your example is ',', which is truthy. So, this should not happen (even if it obviously does in your case).
  • If you want to tweak things here and there, you might want to tweak the serializer instead of generate.py. Either modify simple_space_serializer (linked above), or write your own serializer and specify it from the command line using the -s or --serializer switch (https://github.com/renatahodovan/grammarinator/blob/master/grammarinator/generate.py#L244). It should be simply a function that gets the root node of the tree and should return a string form of it. You could easily add any debug code there.
  • Alternatively/additionally, you might also tweak Tree.print to give more details. E.g., at https://github.com/renatahodovan/grammarinator/blob/master/grammarinator/runtime/tree.py#L69, perhaps something like print('%s%s (%s)' % (' ' * indent, node.name, getattr(node, 'src', ''))). (I haven't tested this. It may or may not be useful for you.)
  • If you want to load saved trees, you can write from grammarinator.runtime import Tree in your script (or directly in the interactive interpreter), assuming that you have grammarinator already installed in your (virtual) environment. Then, Tree.load("path/to/test_xxx.grt") will work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants