Subroutines breaking capture tokenizing inside of referenced capture group #164

RedCMD · 2022-01-02T12:09:37Z

When trying to call a subroutine on a capture group via \\g<1>.
The call will remove all the previous tokens from capture groups that aren't rechecked in the subroutine.

Create a syntax highlighting extension with this code

{
	"$schema": "https://raw.githubusercontent.com/martinring/tmlanguage/master/tmlanguage.json",
	"name": "Subroutines Syntax",
	"scopeName": "source.redcmd.syntax.subroutines",
	"patterns": [
		{ "include": "#subroutines" }
	],
	"repository": {
		"subroutines": {
			"match": "((a)|(b)|(c)|(d))-\\g<1>",
			"captures": {
				"2": { "name": "strong variable.other.constant" },
				"3": { "name": "strong keyword.control" },
				"4": { "name": "strong support.type" },
				"5": { "name": "strong constant.character.escape" }
			}
		}
	}
}

Expected outcome is that it will highlight all text in the format [abcd]-[abcd]

a-a
a-b
a-c
a-d
b-a
b-b
b-c
b-d
c-a
c-b
c-c
c-d
d-a
d-b
d-c
d-d

Like so:

But instead all tokens connected to capture groups that don't get rematched against (and fail) in the subroutine call get purged.
(capture groups 2 to 5)

The text was updated successfully, but these errors were encountered:

RedCMD · 2022-01-13T03:59:48Z

Another way to see it, is to create a highlighter like this:

"match": "(A)(B)(C)(D)(E)(F)(G)(H)(I)(J)\\g<6>?(K)(L)(M)(N)(O)(P)",
"captures": {
	"1":  { "name": "markup.underline invalid" },
	"2":  { "name": "markup.underline string.regexp" },
	"3":  { "name": "markup.underline string" },
	"4":  { "name": "markup.underline constant.character.escape" },
	"5":  { "name": "markup.underline support.function" },
	"6":  { "name": "markup.underline constant.numeric" },
	"7":  { "name": "markup.underline comment" },
	"8":  { "name": "markup.underline support.type" },
	"9":  { "name": "markup.underline variable" },
	"10": { "name": "markup.underline variable.other.constant" },
	"11": { "name": "markup.underline keyword" },
	"12": { "name": "markup.underline punctuation.definition.list.begin.markdown" },
	"13": { "name": "markup.underline header" },
	"14": { "name": "markup.underline constant.regexp" },
	"15": { "name": "markup.underline keyword.control" },
	"16": { "name": "markup.underline punctuation.definition.tag" }
}

and a test file with: ABCDEFGHIJKLMNOP
It should then colour the letters like so:

This does not trigger the subroutine \\g<6> (which is optional) and thus works fine

But if you insert a F inbetween J and K, the call will be made and will break all tokenization ((F)(G)(H)(I)(J)) between (F) (group 6) and the caller \\g<6>

This is extremely annoying when you have to copy and paste large amounts of the same regex over and over again instead of just being able to make a recall to the code.
and you cant just set the code off at the side and never have it run.
The subroutine call will still be able to manage to break itself.

Workaround for microsoft/vscode-textmate#164 and similar issues

RedCMD · 2023-11-05T07:45:24Z

#208

jtbandes mentioned this issue Nov 5, 2023

Repository works only when it defined at top level of grammar file #140

Open

jtbandes added a commit to jtbandes/swift-tmlanguage that referenced this issue Nov 5, 2023

Fix backreference and subpattern highlighting in VS Code

efe7774

Workaround for microsoft/vscode-textmate#164 and similar issues

jtbandes mentioned this issue Nov 5, 2023

Update Swift grammar and upstream repository microsoft/vscode#197470

Merged

RedCMD referenced this issue in RedCMD/TmLanguage-Syntax-Highlighter Nov 19, 2023

Fix character class range bug and improve \\x{}&\\o{} code points

f48d1bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subroutines breaking capture tokenizing inside of referenced capture group #164

Subroutines breaking capture tokenizing inside of referenced capture group #164

RedCMD commented Jan 2, 2022 •

edited

Loading

RedCMD commented Jan 13, 2022

RedCMD commented Nov 5, 2023

Subroutines breaking capture tokenizing inside of referenced capture group #164

Subroutines breaking capture tokenizing inside of referenced capture group #164

Comments

RedCMD commented Jan 2, 2022 • edited Loading

RedCMD commented Jan 13, 2022

RedCMD commented Nov 5, 2023

RedCMD commented Jan 2, 2022 •

edited

Loading