You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There should be a way to embed languages generically, without having to account for every possible comment/string/etc of that specific language that just so happen to break the unrelated container syntax; and having to workaround that by adding a lot of unrelated "hack" rules to fix it.
Several languages allow specifying arbitrary embedded languages (markdown is an example), and having to account for every language pair combination is bad, when this could very well be solved generically. Syntax of embedded languages should be determined on a second "pass" without breaking the syntax of whatever is delimiting it in the parent language; while still allowing it to override parent escape sequences (e.g. in strings) over the embedded language.
I think this would be the ideal implementation for the best embedded language support.
Allow a subPatterns field (and an optional replacementPatterns field with it) that uses this second-pass logic. They would be mutually exclusive with patterns. This is how it could work when subPatterns is present:
The start..end|while rule is matched first, without considering any sub-patterns or replacement patterns. Let's say the text content between them is all stored into a innerText variable.
Then apply replacement patterns if they exist. They are basically the same as patterns, except they use match and a replaceWith field to specify substitution within innerText. Place the result into a subCode variable. So the sub-patterns will later operate considering these. So, for example, if a < to < substitution occurs, then sub-patterns operate on this new text. This allows you to replace escaping syntax from the parent language before the sub-pattern that includes the embedded language.
The replaceWith field can have back-references from its match groups. Those can be the literal group text, or the unicode char from the hex or decimal number from the group (for generic unicode escape sequences).
Then apply subPatterns into just the subCode text atomically, on an inner/sub pass.
For any regions of innerText that had replacements, apply the replacement scope name on top of whatever scopes come from the sub-patterns. So this way you can inter-mix escaping syntax of both languages.
Additionally, allow parent back-references in the "include" names, so you can add any arbitrary language ids.
A theoretical example:
{"name": "string.quoted.embedded-code.$1.my-lang","begin": "([\\w-]+)`",// group 1 is the language id"beginCaptures": {"1": {"name": "entity.other.language.my-lang"}},"end": "`","contentName": "meta.embedded.block.$1 source.$1","replacementPatterns": [// $1 would replace with the char in group 1 below literally{"match": "\\\\([`\\\\])","replaceWith": "$1","name": "constant.character.escape.my-lang"},// $h1 could replace with the unicode char from the hex number matched by group 1{"match": "\\\\u(\\h{4})","replaceWith": "$h1","name": "constant.character.escape.my-lang"},// $d1 same as above, but for decimal numbers{"match": "\\\\c\\[(\\d+)\\]","replaceWith": "$d1","name": "constant.character.escape.my-lang"},],"subPatterns": [// "include" could allow back-references from the parent begin/match pattern// to support arbitrary languages{"include": "source.$1"}]}
This would let you include any arbitrary embedded language without having to know anything about its syntax, and you could even have escaping in the parent language be recognized and everything would just work.
Example code for this theoretical my-lang:
(all escapes are from my-lang, except backslash is escaped twice, for both languages)
There should be a way to embed languages generically, without having to account for every possible comment/string/etc of that specific language that just so happen to break the unrelated container syntax; and having to workaround that by adding a lot of unrelated "hack" rules to fix it.
Several languages allow specifying arbitrary embedded languages (markdown is an example), and having to account for every language pair combination is bad, when this could very well be solved generically. Syntax of embedded languages should be determined on a second "pass" without breaking the syntax of whatever is delimiting it in the parent language; while still allowing it to override parent escape sequences (e.g. in strings) over the embedded language.
I think this would be the ideal implementation for the best embedded language support.
Allow a
subPatterns
field (and an optionalreplacementPatterns
field with it) that uses this second-pass logic. They would be mutually exclusive withpatterns
. This is how it could work whensubPatterns
is present:innerText
variable.patterns
, except they usematch
and areplaceWith
field to specify substitution withininnerText
. Place the result into asubCode
variable. So the sub-patterns will later operate considering these. So, for example, if a<
to<
substitution occurs, then sub-patterns operate on this new text. This allows you to replace escaping syntax from the parent language before the sub-pattern that includes the embedded language.replaceWith
field can have back-references from itsmatch
groups. Those can be the literal group text, or the unicode char from the hex or decimal number from the group (for generic unicode escape sequences).subPatterns
into just thesubCode
text atomically, on an inner/sub pass.innerText
that had replacements, apply the replacement scope name on top of whatever scopes come from the sub-patterns. So this way you can inter-mix escaping syntax of both languages.Additionally, allow parent back-references in the "include" names, so you can add any arbitrary language ids.
A theoretical example:
This would let you include any arbitrary embedded language without having to know anything about its syntax, and you could even have escaping in the parent language be recognized and everything would just work.
Example code for this theoretical my-lang:
(all escapes are from my-lang, except backslash is escaped twice, for both languages)
These would not break the syntax in my-lang, as the inner code is isolated:
The text was updated successfully, but these errors were encountered: