Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: discard irrelevant tokens. #8

Open
fehmud opened this issue Nov 30, 2018 · 2 comments
Open

Feature request: discard irrelevant tokens. #8

fehmud opened this issue Nov 30, 2018 · 2 comments

Comments

@fehmud
Copy link

fehmud commented Nov 30, 2018

Hello,

to my understanding, currently Grammatica requires declaration for all tokens, even when they are irrelevant to the analyzer. For instance, in this production:

IfStatement = "IF" Predicate "THEN" ThenClause "ELSE" ElseClause "END" ;

even if the analyzer is interested only to Predicate, ThenClause and ElseClause, the grammar must declare "IF", "THEN", "ELSE", "END" as tokens, too. To discard them, the corresponding "Exit" method must be overridden to return null, but doing so clutters the code of the analyzer in addition to the grammar. Grammatica could instead discard undefined literal strings in %productions%. If you deem this to be error-prone - because a user could forget to define a token - then a grammar parameter could be added to enable this enhancement.

Thanks for your attention.

@cederberg
Copy link
Owner

cederberg commented Dec 5, 2018

Thanks for the suggestions. Omitting the declaration of string tokens such as these should be possible, but is currently required due to risk of typos, etc. And also makes auto-generated Analyzer method names uglier. Should perhaps be an option.

Automatic removal of static string tokens isn't doable in the general case, as some productions will require the static tokens in order to distinguish between alternatives. Like in this case:

Term := Factor "*" Term
     |  Factor "/" Term
     |  Factor ;

You don't have to add methods returning null in your Analyzer, though. Instead, your analyzer method for the production could just ignore these tokens and work only on attached node values instead:

    protected Node exitIfStatement(Production node) throws ParseException {
        Object predValue = getValue(getChildAt(node, 1), 0);
        Object ifValue = getValue(getChildAt(node, 3), 0);
        Object elseValue = getValue(getChildAt(node, 5), 0);
        node.addValue(buildIfValueHere(predValue, ifValue, elseValue));
        return node;
    }

or even better (as no values are attached to nodes by default):

    protected Node exitIfStatement(Production node) throws ParseException {
        ArrayList values = getChildValues(node);
        node.addValue(buildIfValueHere(values.get(0), values.get(1), values.get(2)));
        return node;
    }

Unless, of course, it is important to build a parse tree containing only the "important" parts for using in some other way.

@fehmud
Copy link
Author

fehmud commented Dec 5, 2018

Thank you for taking the time to assess my proposal.

Omitting the declaration of string tokens such as these should be possible, but is currently required due to risk of typos, etc.

Then a new %discard% flag for tokens could be an alternative.

And also makes auto-generated Analyzer method names uglier.

Why? It seems to me that such keywords don't need to generate anything. They should be treated almost like whitespace, the only difference being that they influence parsing. When I see an If production in the tree, I know that the first token was a IF, the third a THEN, etc. There is no need for the tree to reaffirm that, is there? As far as I can remember, I have never seen a representation of an AST containing such redundant information. Ah! I didn't mention that I was annoyed by such perceived "clutter" while I was inspecting nodes in the debugger.

As a side note, what is ugly are auto-generated constant names in C#, where naming guidelines recommend PascalCase instead of ALL_CAPS (^_^)

Automatic removal of static string tokens isn't doable in the general case, as some productions will require the static tokens in order to distinguish between alternatives.

Agreed. By mentioning "irrelevant tokens", I meant that only tokens that don't influence processing should be left undeclared. Sorry for not having been clear about this.

In any case, it seems that we agree that this should be an option in the grammar.

However...

    ArrayList values = getChildValues(node);

.. this! I didn't realize that such a shortcut existed.

Thanks for your attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants