Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subroutines breaking capture tokenizing inside of referenced capture group #164

Open
RedCMD opened this issue Jan 2, 2022 · 2 comments
Open

Comments

@RedCMD
Copy link

RedCMD commented Jan 2, 2022

When trying to call a subroutine on a capture group via \\g<1>.
The call will remove all the previous tokens from capture groups that aren't rechecked in the subroutine.

Create a syntax highlighting extension with this code

{
	"$schema": "https://raw.githubusercontent.com/martinring/tmlanguage/master/tmlanguage.json",
	"name": "Subroutines Syntax",
	"scopeName": "source.redcmd.syntax.subroutines",
	"patterns": [
		{ "include": "#subroutines" }
	],
	"repository": {
		"subroutines": {
			"match": "((a)|(b)|(c)|(d))-\\g<1>",
			"captures": {
				"2": { "name": "strong variable.other.constant" },
				"3": { "name": "strong keyword.control" },
				"4": { "name": "strong support.type" },
				"5": { "name": "strong constant.character.escape" }
			}
		}
	}
}

image

Expected outcome is that it will highlight all text in the format [abcd]-[abcd]

a-a
a-b
a-c
a-d
b-a
b-b
b-c
b-d
c-a
c-b
c-c
c-d
d-a
d-b
d-c
d-d

Like so:
image

But instead all tokens connected to capture groups that don't get rematched against (and fail) in the subroutine call get purged.
(capture groups 2 to 5)
image

@RedCMD
Copy link
Author

RedCMD commented Jan 13, 2022

Another way to see it, is to create a highlighter like this:
image

"match": "(A)(B)(C)(D)(E)(F)(G)(H)(I)(J)\\g<6>?(K)(L)(M)(N)(O)(P)",
"captures": {
	"1":  { "name": "markup.underline invalid" },
	"2":  { "name": "markup.underline string.regexp" },
	"3":  { "name": "markup.underline string" },
	"4":  { "name": "markup.underline constant.character.escape" },
	"5":  { "name": "markup.underline support.function" },
	"6":  { "name": "markup.underline constant.numeric" },
	"7":  { "name": "markup.underline comment" },
	"8":  { "name": "markup.underline support.type" },
	"9":  { "name": "markup.underline variable" },
	"10": { "name": "markup.underline variable.other.constant" },
	"11": { "name": "markup.underline keyword" },
	"12": { "name": "markup.underline punctuation.definition.list.begin.markdown" },
	"13": { "name": "markup.underline header" },
	"14": { "name": "markup.underline constant.regexp" },
	"15": { "name": "markup.underline keyword.control" },
	"16": { "name": "markup.underline punctuation.definition.tag" }
}

and a test file with: ABCDEFGHIJKLMNOP
It should then colour the letters like so:
image
This does not trigger the subroutine \\g<6> (which is optional) and thus works fine

But if you insert a F inbetween J and K, the call will be made and will break all tokenization ((F)(G)(H)(I)(J)) between (F) (group 6) and the caller \\g<6>
image

This is extremely annoying when you have to copy and paste large amounts of the same regex over and over again instead of just being able to make a recall to the code.
and you cant just set the code off at the side and never have it run.
The subroutine call will still be able to manage to break itself.

@RedCMD
Copy link
Author

RedCMD commented Nov 5, 2023

#208

RedCMD referenced this issue in RedCMD/TmLanguage-Syntax-Highlighter Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant