Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble parsing/lexing 'errors' #49

Open
mingodad opened this issue Jul 18, 2023 · 3 comments
Open

Trouble parsing/lexing 'errors' #49

mingodad opened this issue Jul 18, 2023 · 3 comments

Comments

@mingodad
Copy link
Contributor

Converting this grammar https://github.com/youtube/cobalt/blob/main/cobalt/css_parser/grammar.y I found that lalr has trouble parsing/lexing the identifier errors.

error_bug {

%whitespace "[ \t\r\n]*";
%whitespace "//[^\n\r]*";
//%whitespace "/\*[^*]+\*/";
%whitespace "/\*:C_MultilineComment:";

errors :
	error
	| errors error
	;

}

Output:

lalr (10:0): ERROR: undefined symbol 's'
Error compiling grammar. Error count = 1
@mingodad
Copy link
Contributor Author

This seems to fix the problem:

bool GrammarParser::match_error()
{
    const char *saved_position = position_;
    bool result = match( "error" );
    //check for fully word match
    if(result && position_ != end_ && (isalnum(*position_) || isdigit(*position_) || *position_ == '_'))
    {
        position_ = saved_position;
        return false;
    }
    
    return result;
}

@mingodad
Copy link
Contributor Author

I ended up with this fix mingodad@208eda6

@mingodad
Copy link
Contributor Author

It seems that a similar problem exists in the lexer too:

lex_bug {
    %whitespace "[ \t\n\r]*";
    
    goal: function | id;
    
    function : 'function';
    id : "[a-zA-Z][a-zA-Z0-9]*";
}

Input:

function_exists

Output of dumping the lexer:

=line:column:type:index:identifier:lexeme:value
1:1:1:5:[function]:[function]:[function]
lalr (1:9): ERROR: Lexical error on character '_' (95)
1:9:-1:-1:[]:[]:[]
1:10:1:6:[id]:[[a-zA-Z][a-zA-Z0-9]*]:[exists]
=line:column:type:index:identifier:lexeme:value
1:1:1:5:[function]:[function]:[function]
lalr (1:9): ERROR: Lexical error on character '_' (95)
1:9:-1:-1:[]:[]:[]
1:10:1:6:[id]:[[a-zA-Z][a-zA-Z0-9]*]:[exists]

Output parser:

lalr (1:9): ERROR: Lexical error on character '_' (95)
lalr (1:9): ERROR: Syntax error on '' when expecting dot_end
lalr (1:9): ERROR: Lexical error on character '_' (95)
lalr (1:9): ERROR: Syntax error on '' when expecting dot_end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant