Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Textmate engine bug for \k<> backreferences #193

Open
jeff-hykin opened this issue Dec 12, 2022 · 2 comments
Open

Textmate engine bug for \k<> backreferences #193

jeff-hykin opened this issue Dec 12, 2022 · 2 comments

Comments

@jeff-hykin
Copy link

Example of working as expected

Input is on the left, output is on the right.
The "end" pattern is referencing the 2nd group created in "begin" (the EOF)
should_happen

What (intentional) failure looks like (non-issue)

A bad pattern causes this kind of behavior:
(note: yellow is the theme's color for entity.shell, which is the included pattern)
bad_pattern

What is broken

\k<2> should be equivlent to \2 and in other places it does behave equivlently

However, instead of failing normally (e.g. all-yellow) it seems to trigger undefined behavior:
(Note: \2 is not a viable workaround when group numbers are ≥10)
Screen Shot 2022-12-12 at 3 05 46 PM

Here's the code for the problematic pattern.
This is for VS Code 1.72.2, on Mac M1

{
    "begin": "(<<)\\s*+\\s*+((?<!\\w)[a-zA-Z_][a-zA-Z_0-9]*(?!\\w))(?=\\s|;|&|<|\"|')",
    "end": "\\2",
    "beginCaptures": {
        "1": {
            "name": "keyword.operator.heredoc.shell"
        },
        "2": {
            "name": "string.delimiter.shell"
        }
    },
    "endCaptures": {},
    "name": "string.unquoted.heredoc.no-indent.shell",
    "patterns": [
        {
            "match": ".+",
            "name": "entity.shell"
        }
    ]
}
@RedCMD
Copy link

RedCMD commented Dec 13, 2022

Can confirm
seems like there are two different points being made

  1. \\k<2> does not behave the same as \\2 when backreferencing capture groups between begin/end rules.
    I would think is a non-issue as it would be annoying to have to count all the capture groups in begin when trying to reference one in end (through the usage of \\k<2>)
  2. an invalid group number in \\k<2> causes the textmate engine to crash.
    this is more or less consistent with all other textmate errors. eg. invalid \\g<4> groups
    seems like \\2 inside end has a special property, to not crash the engine when capture group 2 does not exist, but instead match against nothing .

\\h<2> matches against a hexadecimal number and the literal chars <2>
image

(Note: \2 is not a viable workaround when group numbers are ≥10)

\\14 works fine for me?
image

@jeff-hykin
Copy link
Author

jeff-hykin commented Dec 13, 2022

\14 works fine for me?

Oh interesting, I suppose (?:\14)4 would be equivlent to \k<14>4 in that case.
So there's still a bug, but there's a reliable workaround (which is great news for me)

causes the textmate engine to crash. this is more or less consistent with all other textmate errors

I'd argue that for both \k and \g, either the crash should show up in the debug console, or (if crashing is not an option) then the engine should fallback on matching as an empty string. Having it partially highlight document, while sliently crashing is what I would consider an issue.

I would think is a non-issue as it would be annoying to have to count all the capture groups in begin when trying to reference one in end (through the usage of \k<2>)

Many existing syntaxes, like Ruby and Shell, would break if that reference-groups-from-the-start feature never worked. Just cause a feature is hard to manually use doesn't make it being broken a non-issue.

it would be annoying to have to count all the capture groups

I agree which is why I never count capture groups, I made the ruby grammer builder do the heavy lifting. Some C++ patterns have over 100 capture groups so it would've been unrealistic for me to maintain any other way.

Screen Shot 2022-12-13 at 10 23 23 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants