[Syntax Highlighting] Invalid unicode regex match #78

lildude · 2019-10-24T17:55:35Z

As with #76, our grammar compiler has found another error introduced in #72. This time it's an invalid unicode regex match:

Invalid regex in grammar: `source.hack` (in `syntaxes/hack.json`) contains a malformed regex (regex "`(?xi)([a-z_\x{7f}-\x{7fffffff}]`...": character value in \x{} or \o{} is too large (at offset 30))

... and ...

Invalid regex in grammar: `source.hack` (in `syntaxes/hack.json`) contains a malformed regex (regex "`(?i)[a-z_\x{7f}-\x{7fffffff}][a-`...": character value in \x{} or \o{} is too large (at offset 27))

The line numbers have been truncated. but they correspond to...

vscode-hack/syntaxes/hack.json

Line 910 in 62329f6

    
           "match": "(?xi)\n([a-z_\\x{7f}-\\x{7fffffff}][a-z0-9_\\x{7f}-\\x{7fffffff}]*)                 # Exception class\n((?:\\s*\\|\\s*[a-z_\\x{7f}-\\x{7fffffff}][a-z0-9_\\x{7f}-\\x{7fffffff}]*)*) # Optional additional exception classes\n\\s*\n((\\$+)[a-z_\\x{7f}-\\x{7fffffff}][a-z0-9_\\x{7f}-\\x{7fffffff}]*)           # Variable",

... and ...

vscode-hack/syntaxes/hack.json

Line 918 in 62329f6

"match": "(?i)[a-z_\\x{7f}-\\x{7fffffff}][a-z0-9_\\x{7f}-\\x{7fffffff}]*",

... respectively.

I suspect the intent here was to cover all unicode chars from 0x7F to the end, however 0x7FFFFFFF is no longer a valid UTF-8 unicode char. As of 2003, the max is 0x10FFFF.

From https://en.wikipedia.org/wiki/UTF-8#History:

In November 2003, UTF-8 was restricted by RFC 3629 to match the constraints of the UTF-16 character encoding: explicitly prohibiting code points corresponding to the high and low surrogate characters removed more than 3% of the three-byte sequences, and ending at U+10FFFF removed more than 48% of the four-byte sequences and all five- and six-byte sequences.

PR coming up to implement this change.

The text was updated successfully, but these errors were encountered:

lildude mentioned this issue Oct 24, 2019

Use newer max unicode of 0x10ffff #79

Merged

fredemmott mentioned this issue Oct 24, 2019

Suggestion: integrate the linguist grammar compiler into CI #80

Open

PranayAgarwal closed this as completed in #79 Oct 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Syntax Highlighting] Invalid unicode regex match #78

[Syntax Highlighting] Invalid unicode regex match #78

lildude commented Oct 24, 2019 •

edited

Loading

[Syntax Highlighting] Invalid unicode regex match #78

[Syntax Highlighting] Invalid unicode regex match #78

Comments

lildude commented Oct 24, 2019 • edited Loading

lildude commented Oct 24, 2019 •

edited

Loading