[CSharp, but likely all runtimes] Misleading parse error messages. #4723

kaby76 · 2024-10-27T13:41:18Z

This is a problem posted on Twitter/X (see thread https://x.com/KenDomino/status/1849814576902099279).

The problem is that Antlr error messages only refer to strings, not token types, which is what parsers input.

10/27-09:21:18 ~/temp2/Generated-CSharp
$ cat X.g4
grammar X;
start: 'proc';
ID: [a-z]+;
PROC: 'proc';
WS: [ \t\n\r]+ -> skip;
10/27-09:21:21 ~/temp2/Generated-CSharp
$ cat proc.txt
proc
10/27-09:21:24 ~/temp2/Generated-CSharp
$ make
bash build.sh
  Determining projects to restore...
  Restored C:\msys64\home\Kenne\temp2\Generated-CSharp\Test.csproj (in 584 ms).
  Determining projects to restore...
  All projects are up-to-date for restore.
C:\msys64\home\Kenne\temp2\Generated-CSharp\MyParserInterpreter.cs(31,29): warning CS0169: The field 'MyParserInterpreter._grammarFi
leName' is never used [C:\msys64\home\Kenne\temp2\Generated-CSharp\Test.csproj]
  Test -> C:\msys64\home\Kenne\temp2\Generated-CSharp\bin\Debug\net8.0\Test.dll

Build succeeded.

C:\msys64\home\Kenne\temp2\Generated-CSharp\MyParserInterpreter.cs(31,29): warning CS0169: The field 'MyParserInterpreter._grammarFi
leName' is never used [C:\msys64\home\Kenne\temp2\Generated-CSharp\Test.csproj]
    1 Warning(s)
    0 Error(s)

Time Elapsed 00:00:05.49

Workload updates are available. Run `dotnet workload list` for more information.
10/27-09:21:38 ~/temp2/Generated-CSharp
$ ./bin/Debug/net8.0/Test.exe proc.txt
line 1:0 mismatched input 'proc' expecting 'proc'
CSharp 0 proc.txt fail 0.0165835
Total Time: 0.1522863
10/27-09:21:46 ~/temp2/Generated-CSharp
$

In this grammar, we define two lexer rules that match the same string 'proc', but mistakenly put them in the wrong order. The error message is line 1:0 mismatched input 'proc' expecting 'proc'. This message is flawed because it's circular: 'proc' equals 'proc'. An error message should be precise; this is not.

What should be outputted are the token types found and expected--as well as the strings. For example, line 1:0 mismatched input 'proc' (token type ID=1) expecting 'proc' (token type PROC=2).

I've seen this problem a few times before (I can't find citations--sorry).

For testing in grammars-v4, I can write code to emit precise error messages. This will come in handy when someone creates a PR with a regression in the lexer rules. However, the default code should probably be changed.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CSharp, but likely all runtimes] Misleading parse error messages. #4723

[CSharp, but likely all runtimes] Misleading parse error messages. #4723

kaby76 commented Oct 27, 2024 •

edited

Loading

[CSharp, but likely all runtimes] Misleading parse error messages. #4723

[CSharp, but likely all runtimes] Misleading parse error messages. #4723

Comments

kaby76 commented Oct 27, 2024 • edited Loading

kaby76 commented Oct 27, 2024 •

edited

Loading