Syntax Development Tips/Advice #757

wbond · 2016-12-23T16:37:22Z

If you've spent some time writing syntaxes, take a moment here and share any revelations you've had, or tips on this to test or look for.

wbond · 2016-12-23T16:40:11Z

Check for Scope Doubling

The characters (, ), {, }, [, ] are very easy to double scopes on via meta_scope. I try to add tests for ^ punctuation and then also ^ - punctuation punctuation to ensure they aren't there.

djspiewak · 2016-12-23T18:54:27Z

Stateful Chaining

Don't Over-Use

sublime-syntax makes it very easy to have complex interlocking chains of stateful contexts which transition from one to the other via set. This is sometimes necessary to achieve the desired scoping, but it's also very easy to get lost in the mire. Always convince yourself that you need this feature before you use it.

Push Your First State

While it is absolutely possible to have a match in main which sets into a chain of stateful contexts, and subsequently sets back into main at the end, it is not recommended. main should be a stateless "baseline" context that is always the last element on the stack. Instead, have your match in main use push to get into your first state, then pop out of the last state. For example, imagine we wanted to match the sequence abc with each character scoped differently, and only when they follow each other. For illustration purposes, we will also match numerics in main:

contexts:
  main:
    - match: a
      scope: first
      push: expect-b
    - match: \d+
      scope: constant.numeric

  expect-b:
    - match: b
      scope: second
      set: expect-c

  expect-c:
    - match: c
      scope: third
      pop: true

Notice how a pushes expect-b. We don't set the first context, only the second one. Once we find the terminator, we pop out.

Lookahead Push for Meta-Scoping

Sometimes you need to apply a meta-scope to an entire stateful chunk. When this is the case, you almost certainly want your push rule to be a non-consuming lookahead, rather than a consuming scoped match. We can modify the above:

contexts:
  main:
    - match: (?=a)
      push: expect-a
    - match: \d+
      scope: constant.numeric

  expect-a:
    - meta_scope: meta.abc
    - match: a
      scope: first
      set: expect-b

  expect-b:
    - meta_scope: meta.abc
    - match: b
      scope: second
      set: expect-c

  expect-c:
    - meta_scope: meta.abc
    - match: c
      scope: third
      pop: true

Bail Outs

Always remember that you're writing a parser for a set of partially valid syntax fragments. The normal mode of operation is that someone is actively typing new text. For this reason, you need to make sure that any and all stateful contexts you use have aggressive "bail-outs" for when something goes wrong. As a rule of thumb, if there's a case where a compiler's parser would have produced an error, your syntax mode should handle that case by poping back to main.

Consider the example from above. Imagine the user is typing typing into the following buffer:

42
ab
12

Even if the user is actively typing c following b, it would be a terrible experience for the scoping on 12 to shift back and forth as they type in the middle. For this reason, you should always end your mid-state scopes with a lookahead match on (?=\S) which pops out of the state chain. Like so:

contexts:
  main:
    - match: (?=a)
      push: expect-a
    - match: \d+
      scope: constant.numeric

  bail-out:
    - match: (?=\S)
      pop: true

  expect-a:
    - meta_scope: meta.abc
    - match: a
      scope: first
      set: expect-b
    - include: bail-out

  expect-b:
    - meta_scope: meta.abc
    - match: b
      scope: second
      set: expect-c
    - include: bail-out

  expect-c:
    - meta_scope: meta.abc
    - match: c
      scope: third
      pop: true
    - include: bail-out

Now, when the user starts with the following buffer:

42

12

They can place the cursor on the second line and type a and the scoping on 12 will remain unchanged. Getting this wrong is one of the easiest ways to create a terrible experience for users of your mode without even realizing it yourself.

Test Partially-Valid Buffers

Don't just test that correctly-written constructs were scoped appropriately. Test that partial fragments also scope in a reasonable way. Test that unrelated constructs which come lexically after these partially-valid constructs are also scoped correctly.

Organize

There are no bonus points (or bonus performance) for brevity. Organize your sublime-syntax file the way you would organize any serious bit of code. Use spaces, newline breaks ((?x) in your regex patterns can be invaluable!) and comments to your advantage.

Thom1729 · 2017-03-16T21:17:07Z

Use the Stack

Long chains of set contexts can be difficult to follow and impossible to understand at a glance. When you have a construction where you expect a list of elements in sequence, put them all onto the stack at once. The stack will unwind as the elements are recognized. Example:

contexts:
  else-pop:
    - match: (?=\S)
      pop: true

  function-body:
    - match: \{
      scope: punctuation.section.braces.begin.js
      set:
        - meta-function-body
        - expect-closing-brace
        - statements
        - directives

  meta-function-body:
    - meta_scope: meta.function.body.js
    - include: else-pop

  expect-closing-brace:
    - match: \}
      scope: punctuation.section.braces.end.js
      pop: true
    - include: else-pop

  statements:
    - match: (?=\})
      pop: true

    - ...

  directives:
    - match: "'use (?:strict|asm)';"
      scope: keyword.other.directive.js
    - include: else-pop

As an added benefit, most of these scopes can be reused:

  statements:
    ...
    - match: \{
      scope: punctuation.section.braces.begin.js
      push:
        - meta-block
        - expect-closing-brace
        - statements
    ...

  meta-block:
    - meta_scope: meta.block.js
    - include: else-pop

And they can be easily composed:

  expression:
    ...
    - match: \bfunction\b
      scope: meta.function storage.type.function.js
      set:
        - function-body
        - function-parameters // Implementations omitted
        - function-name
    ...

As a bonus, states stacked this way are implicitly optional. If one is omitted, the highlighter will move on to the next without interruption. For instance, in the last example, the construction will be parsed correctly whether or not the author supplies a function name.

(At first, I was concerned about the efficiency of all of the stack manipulation and context switching, but in practice it seems to be just as fast as the traditional method. The JS+Babel+React+Flow syntax I've developed this architecture for runs about 15% faster than the stock JS syntax.)

Thom1729 · 2017-03-17T21:31:43Z

Preprocessing with YAML Macros

Syntax definitions can have a lot of repetitive elements to them. Sometimes, these elements are simple enough that you can simple include a utility context:

contexts:
  else-pop:
    - match: (?=\S)
      pop: true

  order-expression:
    - match: (?i)\b(?:ASC|DESC|NULLS|FIRST|LAST)\b
      scope: keyword.other.sql
    - include: else-pop

But this example contains another common SQL idiom: keywords are always case-insensitive and surrounded by word breaks, so the syntax will contain many repetitions of the (?i)\b(?:...)\b pattern. A typo could easily slip into one such repetition. So why write it yourself each time? Instead, use YAML tags and a preprocessor:

  order-expression:
    - match: !word ASC|DESC|NULLS|FIRST|LAST
      scope: keyword.other.sql
    - include: else-pop

The macro:

# macros.py

def word(match):
    return r'(?i)\b(?:%s)\b' % match

The engine:

# build.py

import yaml, sys
from os import path

filename = sys.argv[1]

output_path, extension = path.splitext(path.basename(filename))

if extension != '.source': raise "Not a .source file!"

input_file = open(filename, 'r')
output_file = open(output_path, 'w')

import macros

PREAMBLE = '%YAML 1.2\n---\n'

def apply_transform(loader, node, transform):
    try:
        if isinstance(node, yaml.ScalarNode):
            return transform(loader.construct_scalar(node))
        elif isinstance(node, yaml.SequenceNode):
            return transform(*loader.construct_sequence(node))
        elif isinstance(node, yaml.MappingNode):
            return transform(**loader.construct_mapping(node))
    except TypeError as e:
        print('Failed to transform node: {}\n{}'.format(str(e), node))

def get_constructor(transform):
    return lambda loader, node: apply_transform(loader, node, transform)

for name, transform in macros.__dict__.items():
    if callable(transform):
        yaml.add_constructor(
            '!'+name.lstrip('_'),
            get_constructor(transform)
        )

syntax = yaml.load(input_file)

output_file.write(PREAMBLE)
yaml.dump(syntax, output_file)

Another example:

def meta(name):
    return [
        { 'meta_scope': 'meta.%s.sql' % name, },
        { 'match': '', 'pop': True },
    ]

block:
  - match: !word BEGIN
    scope: keyword.control.sql
    push:
      - !meta block
      - statements

We can go further. In Oracle SQL, any identifier may be wrapped in optional double quotes:

create table mytable ... ;
create table "mytable" ... ;

Implementing both versions is possible:

  expect-table-name:
    - match: \b{{ident}}\b
      scope: entity.name.table.sql
      pop: true
    - match: (")([^"]+)(")
      captures:
        1: punctuation.definition.string.begin.sql
        2: entity.name.table.sql
        3: punctuation.definition.string.end.sql
      pop: true
    - include: else-pop

But then we have to do this for every single type of identifier -- procedure names, aliases, variables, etc. So we write a macro for it:

def expect_identifier(scope):
    return [
        { 'match': r'\b{{ident}}\b', 'scope': scope, 'pop': True },
        {
            'match': r'(")([^"]+)(")',
            'scope': 'string.quoted.double.sql',
            'captures': {
                '1': 'punctuation.definition.string.begin.sql',
                '2': scope,
                '3': 'punctuation.definition.string.end.sql',
            },
            'pop': True,
        },

        { 'match': r'(?=\S)', 'pop': True },
    ]

Define all of these scopes the same way:

  expect-table-name: !expect_identifier entity.name.table.sql
  expect-alias: !expect_identifier entity.name.alias.sql

Or just use the macro inline:

declarations:
  ...
  - match: !word TYPE
    scope: storage.type.sql
    push:
      - !meta declaration.type
      - type-definition-value
      - !expect_keyword IS
      - !expect_identifier entity.name.type.sql
  ...

Using macros, you can define very complicated constructs in a compact fashion that is easy to understand and reasonably robust against invalid input:

- match: !word FOR
  scope: keyword.control.sql
  push:
    - !meta control.for
    - !expect_keyword LOOP
    - !expect_keyword END
    - statements
    - !expect_keyword LOOP
    - expression
    - !expect [ \.\., keyword.operator.other ]
    - expression
    - !expect_keyword REVERSE
    - !expect_keyword IN
    - !expect_identifier variable.other.sql

keith-hall · 2017-09-04T07:04:48Z

Keep matches concise

Where possible, avoid match patterns like .* that match the whole line (i.e. for line comments), and instead use a meta_scope and just match the character sequences that need specific scoping or the end of the line ($\n?) that will pop the context.

Why? Because sometimes the syntax could get embedded in another syntax, and that syntax might want to use with_prototype to pop earlier than would otherwise be possible if the match pattern consumes the whole line.

For example, if you have a language that defines line comments like this:

comments:
  - match: (//).*$\n?
    scope: comment.line.double-slash.example
    captures:
      1: punctuation.definition.comment.example

and then want to include it in HTML, for example using a well-known example of PHP style markers:

embed:
  - match: '<\?'
    scope: punctuation.section.embedded.begin.example
    push: [end_embed, scope:base.scope.for.example.above]
    with_prototype:
      - match: '(?=\?>)'
        pop: true

end_embed:
  - match: '\?>'
    scope: punctuation.section.embedded.end.example
    pop: true

then you will find code like:

<?php // comment here ?><div>HTML here</div>

where the with_prototype didn't apply because it doesn't interrupt a match - the (//).* from the comment pattern will match // comment here ?><div>HTML here</div> so there is no way for the with_prototype to see the ?>, which means the whole line will be erroneously scoped as a comment.

So the proper way to declare the comments context would be:

comments:
  - match: '//'
    scope: punctuation.definition.comment.example
    push:
      - meta_scope: comment.line.double-slash.example
      - match: $\n? # Consume the newline so that completions aren't shown at the end of the line when typing comments
        pop: true

tajmone · 2018-04-25T22:05:17Z

Unusual Syntaxes and Their Pitfalls

Lately I've started working on a syntax file for an Interactive Fiction (text adventures) programming language which tries to hide the complexity of programming and look like natural English as much as possible. Because of its scarse use of punctuations, I've found myself facing some unexpected problems — the full story here:

Particularly, I was struggling in handling closing block statement of the type END EVERY identifier. where both the identifier and the dot terminator where optional. After running in circles for many hours (and due to lack of experience with ST syntaxes), I've managed to achieve it, and would like to share some learned lesson here — they might not be the best solutions, but surely they represent the problem newbies are going to face.

Forceful Popping

First of all, I faced the problem of how to pop out of the stack in certain (unavoidable) situations. And I was kindly introduced to the else_POP technique mentioned earlies in this thread. Still ... the chain of included contexts were not behaving as expected.

The main problem was tied to uncosumed whitespace: the else_POP and immediate_POP tricks to force your way out of a stacked context can cause a premature pop if there is still some whitespace floating around which isn't captured by the various included contexts. At the end, I had to ensure that the pattern that would set the context on the stack would also eat up (and dicard) any trailing whitespace.

Also, another problem was the END EVERY construct having two optional trailing keywords (identifier and dot-terminator). To prevent loose scopes floating about, I had to implement an extra check for an END EVERY statement followed by only whitespace (ie: neither ID nor terminator).

Here here is the code of how I've managed to workaround the problem:

  class:
    - match: (?i)\bEVERY\b
      scope: storage.type.class.alan
      set: [class_body, class_identifier]
  
  class_body:
    - meta_scope: meta.class.alan
    - include: class_head
    - include: class_tail
  class_head:
    - match: (?i)\bIsA\b
      scope: storage.modifier.extends
      push: inherited_class_identifier
    # TODO: inheritance

  class_tail:
    # ===========================
    # END EVERY => no ID & no `.`
    # ===========================
    # When END EVERY is not followed by neither ID or dot, we must capture it
    # separately to avoid stray scopes after it...
    - match: (?i)\b(END\s*EVERY)\b(?=\s*)$
      captures:
        1: keyword.control.alan
      pop: true
    # ==========================
    # END EVERY => ID and/or `.`
    # ==========================
    - match: (?i)\b(END\s*EVERY)\b\s* # <= must consume all whitespace!
      captures:
        1: keyword.control.alan
      set:
        - meta_content_scope: meta.class.alan
        - include: class_tail_identifier
        - include: terminator
        - include: force_POP

  terminator:
    - match: '\.'
      scope: punctuation.terminator.alan

It might not be the best solution, but it doesn't have to either: I'm in the early stages of creating this syntax, and sometimes you just need to get the job done and carry on drafting — and things can turn out frustrating when you can't pinpoint what is breaking the expected behavior.

Watchlist of Common Newbie-Mistakes

The lesson I've learned from tackling with this problem is, which might help better understand which context(s) are causing the problem:

Always beware of any leftover whitespace from regex match patterns:
- Try to consume trailing whitespace by capturing it with a discarded group
- captures is better than scope because it allows to add extra discarded groups for testing if leading/trailing whitespace is a problem
Lookaheads are your best friends when it comes to handle optional syntax elements at the end of a meta scope
While working on a syntax's context:
- Annotate the stack level in side comments (it's so easy to loose track of how deep in the stack each context and included statements are)
- For reusable syntax-elements contexts, consider creating both a popless version and another one that pops out of the stack (sometimes you might be including them, other times pushing/setting them)
In lack of context-stack debugger:
- add some arbitrary label to scopes in order to be able to track in the highlighted code which context is active (when there are variants) — eg: by using keyword.control.NONE.alan and keyword.control.IdOrDOTalan I was able to uncover the leftover whitespace problem which was causing the wrong context to be used.

I've learned these small lessons the hard way, by running in circles for hours because I wasn't mind-tracking correctly the stack levels. Also, I struggled a lot with include vs push vs set choices, trying to adpat the context to my own likings and pre-existing reusables, which turned out to be a very bad approach.

When starting to deal with lots of reusable contexts, and contexts nestings, it can quickly become a complex task to keep a clear mental picture of what is actually going on at the parser level. Unfortunately, we can't escape the unpleasant task of having to mentally track what RegEx patterns are capturing, consuming, discarding and how the various stacked contexts loop until they pop out.

Surely, as experience in working with syntaxes starts to set it one eventually develops a right mind frame on how to start out laying the foundations of the syntax with the right foot. The problem is that if the whole experience gets frustrating and no solutions seems possible, one might just give in and never reach that required experience (after all, experience breeds on sucesses, as well as failures).

A Syntax CLI Simulator/Debugger Would Be Invaluable

Another lesson I've learned:

If ST had a way to expose to the user the syntax parser's stack state, its ques, and some debug info about the text being processed, the regexs matches and failures, it would be much easier to trace where our custom syntaxes fail.

Any chance that (somewhere in the future) ST might also ship with a command line tool to debug syntax definitions? A console app that takes a syntax file and a source test-files as input and spits out two files: a scoped version of the souce file (an XML like doc tree) and a log file listing all the innerworkings of the parser engine. This would be an invaluable tool to both learn how to build syntaxes as well as to fix problems.

Learning to create custom syntaxes should be a pleasant experience, not a frustrating one. The official documentation on the topic is not exactly "exhaustive" (far from it), and most existing syntaxes are usually too large to be used as learning examples to start with.

Syntect: A Fallback Debugger

@kingkeith (which I believe might be @keith-hall here on GitHub) pointed out to me the availability of syntect, a "Rust library for syntax highlighting using Sublime Text syntax definitions" which offers debugging features via its --debug argument:

https://github.com/trishume/syntect

While there is no assurance that its syntax parser follows 100% that of ST (and small edge cases could cause difference in behavior), it seems to support ST syntax files very well, which means it can be a valuable tool for debugging syntaxes inner workings (pending a dedicate debug tool from ST3).

tajmone · 2018-04-27T13:17:41Z

I know that I ought not consume space in this thread for comments; but I can't refrain from thanking enough @djspiewak for his "Stateful Chaining" advise — after reading it I've managed to correct my syntax draft to handle code fragments without breaking the user experience (while before I was working on the assumption that all code would be always wellformed), and it allowed me to handle better the stack and reusable contexts.

I really wish that I had found a link to this Issue in ST documentation on syntaxes — it would have saved me hours of attempts, and spared me stress-induced psychosomatic complications. I must thank @kingkeith for having brought it to my attention (and for kindly helping me out, along with @ThomSmith, to work my way through the empasses of my first big syntax creation).

I'd also like to add a further tip on Stack POPping tricks.

Stack Popping Tricks

I've found the tricks in this thread on how to pop out of the stack very useful, and I'd like to add another variant which I ended up needing in some contexts, and some comments to the RegEx.

Force POP

Wherever included, this context will pop immediately.

force_POP:
- match: ''
  pop: true

My understanding is that this is a RegEx that doesn't do anything (no match, no consumption) and always returns true, thus forcing the pop: true to act right in the spot.

Else POP

As already mentioned above by @Thom1729 (equivalent to @djspiewak's bail-out):

else_POP:
- match: (?=\S)
  pop: true

this context is great as an "else" condition for popping out of a context that could loop forever. Unlike force_POP, this will work in those context that need to iterate over a few times before exiting.

Its RegEx pattern matches nothing followed by non-whitespace — since it's a lookhead assertion, it doesn't consume anything either. My understanding is that it's just a lookahead operation carried out at the current position of the highlighter's buffer. Therefore, if there is some whitespace ahead this doesn't pop out.

If my understanding is correct, the difference in behavior between this and force_POP is that else_POP will not pop out while there are still non-white space token floating around — I couldn't find any details on how the syntax parser actually handles resuming the context looping after a positive match (ie, if after a match it starts again from the top of the current context patterns list, or if it just carries on to the next pattern in the list), but my impression is that whitespace is silently consumed by the parser (unless a custom patterns consumes it), which means that else_POP will not necessarily pop out of the context the first time it's encountered.

End-of-Line POP

I've come across some situation in which neither else_POP or force_POP did the job, and used instead:

eol_POP:
- match: '$'
  pop: true

This is useful in situations when you need to pop your way out of the current context/Stack when the end of line is reached. Usually it works well with syntaxes where line breaks are not optional, and where a number of optional elements might follow; or just to handle incomplete code fragments.

NOTE — I'll update this post to provide more detailed/correct information regarding the RegExs and parser internals if I get new info about it. Also, I'll add more Stak popping hacks if I encounter them.

tajmone · 2018-04-27T14:12:39Z

Syntax Test Files

Syntax Test Files are a great functionality for automating syntax tests. Unfortunately, ST official documentation on the topic doesn't cover in depth Scope Selectors. While the examples are all there, I've found the following link to Textmate's documentation quite useful:

http://manual.macromates.com/en/scope_selectors

Specifically, I've learned more about the syntax for excluding or grouping scope selectors, via documented examples.

Testing Against Scope Spillings

I've learned how to use syntax files to check that scopes don't spill over to neighbouring elements and/or whitespace.

For example, in this syntax example I'm defining a class in Alan language:

EVERY cow IsA object.
 -- [some definitions]
END EVERY cow. -- a comment

Where everything from EVERY to END EVERY [ID][.] would be scoped as meta.class.

While working on it, I had to deal with scopes that where eating up the trailing whitespace. I've learned to test for those spills using scope selectors subtraction:

EVERY cow IsA object.
--    ^^^                        meta.class.alan   entity.name.class.alan
--       ^^^^^^^^^^^^            meta.class.alan - entity.name.class.alan

The above test-file example snippet checks that the class name scope of cow doesn't spill over to the trailing whitespace nor the following elements. The test checks that cow is correctly scoped as both meta.class.alan and entity.name.class.alan, while what follows it (space include) should be scoped as meta.class.alan but NOT entity.name.class.alan — which is achieved via subtraction (-) of the scope.

Textmate's documentation on the topic states:

we can subtract scope selectors to get the (asymmetric) difference using the minus operator.

Testing Meta Scope Exiting

As a further example, I'll show how to test if the meta scope for the class has been duely exited when expected:

END EVERY cow. -- a comment
--^^^^^^^^^^^^                                 meta.class.alan
--            ^^^^^^^^^^^^^                  - meta.class.alan

Here we're checking that everything up to the terminator dot (included) is scoped as meta.class.alan. If the syntax is working correctly, everything following the terminator dot (excluded) should be scoped only as source.alan — therefore, we substract from the (implicit) base scope the meta class scope: - meta.class.alan.

Note that the following two scope selectors are equivalent:

END EVERY cow. -- a comment
--            ^^^^^^^^^^^^^                    meta.class.alan
--            ^^^^^^^^^^^^^      source.alan - meta.class.alan

... except that source.alan - meta.class.alan is more verbose and rather pointless — except in cases where the tested syntax might be included in other syntaxes. (see @keith-hall's comments below).

Also note that you can also test with carets (^) beyond the actual contents of the code line being tested:

END EVERY cow.
--^^^^^^^^^^^^                                 meta.class.alan
--            ^                  source.alan - meta.class.alan

... in the above example the ^ is testing a non-existing character beyond the the dot terminator, nevertheless the scope test is correct (you can try to change the test scopes and verify it yourself). This means that it is actually testing that the meta scope effectively end with the . terminator.

keith-hall · 2018-04-28T05:52:34Z

Note that it's mostly pointless to check the "base" scope, unless you're embedding another syntax, so source.alan - meta.class.alan can be changed to simply - meta.class.alan. Then, one can find that if the meta scopes are tweaked slightly, the negative assertion holds less merit and is harder to identify when it doesn't really prove anything any more - it can be more useful to just check for - meta.class or even just - meta depending on the circumstances.
And yes, a ^ assertion that points to a character that is after the \n on the line being tested just asserts against the \n position, i.e. eol.

tajmone · 2018-04-28T08:01:45Z

Thanks for the clarification @keith-hall , I didn't realize you could actually subtract without declaring a base scope. I'll edit the example to clarify this, but at the same time I think is worth leaving in the example also a full source.alan - meta.class.alan for learning purposes too, just to show what is really going on. Unfortunately the documentation of scope selectors is really thin, and most existing syntaxes are uncommented. For example, I've noticed more complex scope selector cases in various color schemes or settings, using groups via parenthesis, but it's not easy to work out how these groups are actually affecting scope selection.

keith-hall · 2019-10-19T11:57:42Z

Changing Regex Mode in Variables

When declaring variables, don't use (?x) to switch to multiline / extended mode, instead wrap it in (?x:...) to avoid problems where one may expect not to be in extended mode and having to check the variable to know if it changed any options etc. The same applies for ignore case mode as well as any other flags which affect how the regex pattern is parsed.

Making Variables Atomic

Related to the above, and I'm not sure if this has already been mentioned (I didn't find it with a quick search, but am on mobile and didn't try too hard), but it is also useful to wrap variable declarations in a non-capturing group, so that when referencing them with behavior like {{example_var}}{2}, it is unambiguous and doesn't just repeat the last token declared inside that variable.

i.e. example_var: (?:\w+[.:]) as opposed to example_var: \w+[.:]

FichteFoll · 2019-10-19T12:03:43Z

In addition to the suggestion to use (?x:…):

Use the chomping indicator in block scalars

When using block scalars for your regular expressions, make it a habit to always use |- (or >-) with the hyphen chomping indicator to strip the trailing newline. While whitespaces are ignored in extended mode, in variable definitions where you want to ensure the mode only for the variable, it can be detrimental to add a trailing newline to your variable text.

tajmone · 2019-10-19T12:09:28Z

Syntax Tests: Use of & Operator in Scopes

While working on syntax tests, I discovered an undocumented feature, i.e. that it's possible to use the & operator in the tests scopes. E.g.:

// "string"
//<- punctuation.definition.comment.begin & comment.line

where without the & you'll get an error of umatched scope, because the scopes are in inverted order (i.e. comment.block comes first).

I've found this rather useful, especially to keep same scopes aligned with themselves for easy reading the test sources.

I wonder if there are other operators beside - and & which can be used in syntax tests.

FichteFoll · 2019-10-19T15:51:54Z

Syntax tests work with normal scope selectors and thus all its available operators.

@tajmone

`push` followed by `include` statements were not working as expected, due to unconsumed white-space characters. See <https://forum.sublimetext.com/t/syntax-definitions-how-to-force-pop-out-of-the-stack/36376/4>. This patch implements one of the [sublimehq/Packages#757] advices, which is to use both a popping and pop-less versions of the same reusable context. See <sublimehq/Packages#757 (comment)> from @tajmone. Such "new" contexts will be annotated as `_expect-${POPLESS-CONTEXT-NAME}` in `Nftables`. It also fixes other usages of `force-pop`, that had ended up badly working.

Basically gleaned from sublimehq/Packages#757

rchl · 2020-10-21T13:08:58Z

Rule loop processing order within a context

As the documentation states, "When a context has multiple patterns (rules), the leftmost one will be found.". It's also important to add that the rule loop is reset when the pattern (rule) matches. So with an example context like:

line:
- match: A    # 1
  scope: meta.A
- match: B    # 2
  scope: meta.B

and a line:

AB

The parser will try rule #1, which will match, then will reset the loop and try rule #1 again. Only then it will progress to rule #2. The reason for that is that only rules that consume ZERO characters (either with an explicit empty match like '', a lookahead or for example a single $ or a ^ pattern) advance the rule loop.

So you can't think of rules as being processed unconditionally from top to bottom and assume, for example, that this would allow for matching tokens in a specific order by arranging the rules in a certain way.

Rule handling around the beginning/end of the line

Once all characters on the line are consumed, and there aren't any more characters to match on that line, the engine will run through the loop once again, matching against a special end-of-line "character". This character can't be explicitly consumed so the only way to get past it is to let the engine go through all the rules for it to be consumed (the rules can still match the end-of-line with patterns like $ but since those are non-consuming, the rule loop will advance and eventually reach the end).

Example:

line:
- match: '$'    # 1
  scope: meta.eol
- match: 'A'    # 2
  scope: meta.A

with line:

Engine steps:

does not match rule #1 and advances to rule #2
matches character A with the rule #2 and reset the loop
(optionally goes through the whole loop matching a newline if one exists)
matches EOL with the rule #1 and since the match didn't consume anything, advances to rule #2
does not match rule #2
the loop ends and if there is another line to match, the engine advances to it

FichteFoll · 2020-10-24T13:15:07Z

* matches `EOL` with the rule `#1` and since the match didn't consume anything, advances to rule `#2`

I don't think this is necessarily true. My understanding here was that rule 1 matches, consumes no characters, and then ST forcibly moves to the next character to prevent an infinite loop. The notable difference here is that rule 2 isn't tried. You can test that with a different non-consuming pattern like (?=.) (or an empty pattern).

Edit: This has turned out to be not true, as evident by the following syntax definition marking everything as invalid:

    - match: (?=.)
    - match: .
      scope: invalid.test

jrappen · 2021-01-05T10:43:12Z

It would make sense to move these tips into a ./CONTRIBUTING.md file of this repo to keep them up-to-date with the current release on the dev channel.

tajmone · 2021-01-05T11:16:12Z

It would make sense to move these tips into a ./CONTRIBUTING.md file of this repo to keep them up-to-date with the current release on the dev channel.

A good solution would be to create a project Wiki for this repository (see my proposal at #1522), and then move the tips into the Wiki. The advantage of using a Wiki is that it allows to organize the topics into multiple pages, and it can be set to be editable by anyone.

The only downside I can think of is that although repo Wikis are repositories, only collaborators can push changes, so page editing by non-collaborators has to be done via the WebUI.

Also worth mentioning, the following repository was recently (Mar. 2020) created to address scope naming guidelines:

https://github.com/SublimeText/ScopeNamingGuidelines

But as of today the project seems stale (three commits only).

Another older (and stale) project along those lines:

https://github.com/gwenzek/sublime-syntax-manifesto

I think that using the Wiki of this project would be preferable, since this is an official ST repository, whereas third party repositories might not receive the same attention.

jrappen · 2021-01-05T11:25:46Z

Most scope naming "issues" are tagged with RFC (as is this issue), compare:

RFC

FichteFoll · 2021-01-10T13:39:19Z

But as of today the project seems stale (three commits only).

There was work in a pull request which I have just merged, if you want to check it out. It could definitely move faster, but it not meant to take the tips and advices of this issue here.

Instead, I would rather add them to the community docs at https://github.com/sublimetext-io/docs.sublimetext.io in a more orderly fashion, or, as you suggested, as a page on the wiki here. I think that the cdocs would be more appropriate, since the tips apply to syntax definition development in general and not the default packages specifically.

deathaxe · 2021-08-18T15:18:10Z

Use plural names for non-popping contexts and signgular for popping contexts.

Plural makes clear an included context can handle multiple tokens without popping or setting away from current context. (It may push another context onto stack though.

Singular makes clear the current context to be left as soon as a single token is matched, either by popping or setting another context onto stack.

By doing so, context names such as ...-pop can be avoided. This is useful as some syntaxes consist of nearly only popping contexts, so we would find -pop in nearly every line.

Example 1

A strings context can be included to handle arbritary number of quoted string tokens without leaving the context they are included in. The string-content (singular) is popped as soon as the closing quotation mark or illegal eol is matched.

  literal-double-quoted-strings:
    - match: \"
      scope: punctuation.definition.string.begin
      push: literal-double-quoted-string-body

  literal-double-quoted-string-body:
    - meta_include_prototype: false
    - meta_scope: meta.string string.quoted.double
    - match: \"
      scope: punctuation.definition.string.end
      pop: 1
    - include: illegal-newline
    - include: literal-string-escapes

Example 2

The following context (singular) is to handle a single statement (method declaration) by matching each possible term one after another.

  member-maybe-method:
    - meta_include_prototype: false
    - match: ''
      set:
        - method-block
        - method-attribute
        - method-array-modifier
        - method-signature
        - method-modifier

deathaxe · 2021-08-18T15:24:46Z

Use primarily (only) named contexts.

Reasons:

Only named contexts can be extended or overridden by an inheriting syntax definition
ST's scope name popup (ctrl+shift+p) displays the name of the context a token is matched by. If a syntax contains only named contexts (and wasn't pushed with_prototype), it is much easier to debug highlighting issues as the causing context can be identified much easier.

When creating contexts ask your self, what you'd expect from a syntax if you'd need to inherit from it to create your own extended variant.

Is it easy to override certain rules?

Which contexts must exclude possible prototypes? (- meta_include_prototype: false)

Does your syntax support string interpolation? Can an inherited syntax easily implement it?
See HTML.sublime-syntax for instance.

deathaxe · 2021-08-18T15:26:51Z

Use variables for large lists of fixed tokens (builtin functions etc.), because

they can easily replaced by inheriting syntaxes
contexts itself keep readable by avoiding large blocks of patterns.

See: CSS.sublime-syntax for reference.

- Split `JSON.sublime-syntax` into ... using inheritance: - `JSON (Basic).sublime-syntax` with `scope:source.json.basic` - `JSON.sublime-syntax` with `scope:source.json` - `JSONC.sublime-syntax` with `scope:source.json.jsonc` - `JSON5.sublime-syntax` with `scope:source.json.json5` - Although the base syntax does not define a `prototype`, we add `meta_include_prototype: false` to the base syntax, to prevent inheriting syntaxes from injecting rules. - make use of `variables` - Add many more file extensions for `JSON` and `JSONC` - Significantly extend tests to cover more parts of the syntaxes defined: - Split original test file into logical parts - Add indentation tests for: - `JSON`, `JSONC` - `mapping` (objects), `sequence` (arrays) - leave `JSON` headers in `Markdown` as json only, but split up fenced code blocks into `json` and `jsonc` to behave similarly to `GitHub Flavored Markdown` - fix tests for `meta.mapping meta.mapping.*` - make `mapping.*` contexts more modular - fix sublimehq#285 as requested by Jon - address sublimehq#757 and use tips to fix line comments for `JSONC` - address sublimehq#2430 and use sort-order as requested by deathaxe - address sublimehq#2852 and use tips to fix scopes of curly braces & square brackets in `JSON` Co-authored-by: Ashwin Shenoy <Ultra-Instinct-05@users.noreply.github.com> Co-authored-by: Jack Cherng <jfcherng@users.noreply.github.com> Co-authored-by: Janos Wortmann <jwortmann@users.noreply.github.com> Co-authored-by: Jon Skinner <jps@sublimetext.com> Co-authored-by: FichteFoll <FichteFoll@users.noreply.github.com> Co-authored-by: Keith Hall <keith-hall@users.noreply.github.com> Co-authored-by: Michael B. Lyons <michaelblyons@users.noreply.github.com> Co-authored-by: Rafał Chłodnicki <rchl@users.noreply.github.com> Co-authored-by: deathaxe <deathaxe@users.noreply.github.com>

- Using inheritance split up `JSON.sublime-syntax` into: - `JSON (Basic).sublime-syntax` with `scope:source.json.basic` - `JSON.sublime-syntax` with `scope:source.json` - `JSONC.sublime-syntax` with `scope:source.json.jsonc` - `JSON5.sublime-syntax` with `scope:source.json.json5` - Add many more file extensions for `JSON` & `JSONC`: - add doc links to exensions where applicable as a reference to be able to more quickly verify that they (still) use said syntax flavor - Add JSON5 with support for: - explicitly pos numbers, hexadecimal ints, Infinity and NaN - single quoted strings - more escape chars for strings - Only allow objects or arrays at the top level - add `meta.toc-list` scope to top level object keys to add them to the symbol list (also add tests, see below) - Make use of newer syntax features including those only available in `version: 2` syntaxes - Make use of `variables` - Highlighting speed improvements for empty objects and empty arrays - Significantly improve number highlighting - Correctly scope number signs with `constant.numeric.sign` instead of `keyword.operator.arithmetic` - Significantly extend tests to cover more parts of the syntaxes defined: - Split original test file into logical parts - Add indentation tests for: - `json`, `jsonc` & `json5` - `mapping` (objects), `sequence` (arrays) - Add symbols tests for: - scope: `meta.toc-list.json | meta.toc-list.json5` - languages: `json`, `jsonc` & `json5` - Fix tests for `meta.mapping meta.mapping.*` - Make `mapping.*` contexts more modular - Leave `JSON` headers in `Markdown` as `json` only, but split up fenced code blocks into `json`, `jsonc` & `json5` to behave similarly to `GitHub Flavored Markdown` BREAKING CHANGES: - scopes for number signs have changed from being `keyword.operator.arithmetic` to `constant.numeric.sign` - fix sublimehq#285 as requested by Jon - address sublimehq#757 using tips to fix line comments for `JSONC` - address sublimehq#2430 using sort-order as requested by deathaxe - address sublimehq#2852 using tips to fix scopes of curly braces & square brackets in `JSON` Co-authored-by: Ashwin Shenoy <Ultra-Instinct-05@users.noreply.github.com> Co-authored-by: Jack Cherng <jfcherng@users.noreply.github.com> Co-authored-by: Janos Wortmann <jwortmann@users.noreply.github.com> Co-authored-by: Jon Skinner <jps@sublimetext.com> Co-authored-by: FichteFoll <FichteFoll@users.noreply.github.com> Co-authored-by: Keith Hall <keith-hall@users.noreply.github.com> Co-authored-by: Michael B. Lyons <michaelblyons@users.noreply.github.com> Co-authored-by: Rafał Chłodnicki <rchl@users.noreply.github.com> Co-authored-by: deathaxe <deathaxe@users.noreply.github.com>

- Using inheritance split up `JSON.sublime-syntax` into: - `JSON (Basic).sublime-syntax` with `scope:source.json.basic` - `JSON.sublime-syntax` with `scope:source.json` - `JSONC.sublime-syntax` with `scope:source.json.jsonc` - `JSON5.sublime-syntax` with `scope:source.json.json5` - Add many more file extensions for `JSON` & `JSONC`: - add doc links to extensions where applicable as a reference to be able to more quickly verify that they (still) use said syntax flavor - JSON: - (correctly formatted) JSON code can now be prettified or minified via the context menu or the command palette - highlight leading, trailing & multiple commas as invalid - only allow exactly one structure (object, array) or value (constant, number, string) at top level (thanks to Keith) - JSONC: - highlight some files by default as `JSONC` (as decided by Jon in sublimehq#285) - highlight leading & multiple commas as invalid, trailing as valid - JSON5: - explicitly pos numbers, hexadecimal ints, Infinity and NaN - single quoted strings - more escape chars for strings - ECMA identifierName as object keys (thanks to Thomas) - scoped as plain unquoted strings - line continuation in strings (with tests thanks to Keith) - Objects: - Add `meta.toc-list` scope to top level object keys to add them to the symbol list (also add tests, see below) - Highlighting speed improvements for empty objects (thanks to FichteFoll) - Make `mapping.*` contexts more modular - Arrays: - Highlighting speed improvements for empty arrays (thanks to FichteFoll) - Numbers: - Correctly scope number signs with `constant.numeric.sign` instead of `keyword.operator.arithmetic` - Significantly improve number highlighting (thanks to deathaxe) - Syntaxes: - Make use of newer syntax features including those only available in `version: 2` syntaxes - Make use of `variables` (with optimizations provided by deathaxe and regex patterns provided by Thomas) - Tests: - Significantly extend tests to cover more parts of the syntaxes defined. - Split original test file into logical parts - Add indentation tests for: - `json`, `jsonc` & `json5` - `mapping` (objects), `sequence` (arrays) - Add symbols tests for: - scope: `meta.toc-list.json | meta.toc-list.json5` - languages: `json`, `jsonc` & `json5` - Fix tests for `meta.mapping meta.mapping.*` - Leave `JSON` headers in `Markdown` as `json` only, but split up fenced code blocks into `json`, `jsonc` & `json5` to behave similarly to `GitHub Flavored Markdown` BREAKING CHANGES: - JSON does not have values that can be set via an inline calculation with the help of operators, but only simple number values. Scopes for number signs have changed from being `keyword.operator.arithmetic` to `constant.numeric.sign`. Color scheme authors should add this, should it be missing. - The `JSON.sublime-syntax` now marks comments as `invalid`, third party plugin authors should target `JSONC.sublime-syntax` instead to have the same user experience as before. - fix sublimehq#285 - address sublimehq#481 to remove incompatible regex patterns according to Will - address sublimehq#757 to fix line comments for `JSONC` (thanks to Keith) - address sublimehq#2430 using sort-order (as requested by deathaxe) - address sublimehq#2852 to fix scopes of curly braces & square brackets in `JSON` (thanks to Thomas) - address sublimehq/sublime_text#3154 and add symbol tests Co-authored-by: Ashwin Shenoy <Ultra-Instinct-05@users.noreply.github.com> Co-authored-by: Jack Cherng <jfcherng@users.noreply.github.com> Co-authored-by: Janos Wortmann <jwortmann@users.noreply.github.com> Co-authored-by: Jon Skinner <jps@sublimetext.com> Co-authored-by: FichteFoll <FichteFoll@users.noreply.github.com> Co-authored-by: Keith Hall <keith-hall@users.noreply.github.com> Co-authored-by: Michael B. Lyons <michaelblyons@users.noreply.github.com> Co-authored-by: Rafał Chłodnicki <rchl@users.noreply.github.com> Co-authored-by: Thomas Smith <Thom1729@users.noreply.github.com> Co-authored-by: Will Bond <wbond@users.noreply.github.com> Co-authored-by: deathaxe <deathaxe@users.noreply.github.com>

@deathaxe

- Using inheritance split up `JSON.sublime-syntax` into: - `JSON.sublime-syntax` with `scope:source.json` - `JSONC.sublime-syntax` with `scope:source.json.jsonc` - `JSON5.sublime-syntax` with `scope:source.json.json5` - `JSON_dotNET.sublime-syntax` with `scope:source.json.json-dotnet` - Add many more file extensions for `JSON` & `JSONC`: - add doc links to extensions where applicable as a reference to be able to more quickly verify that they (still) use said syntax flavor - JSON: - Make use of newer syntax features including those only available in `version: 2` syntaxes - Make use of `variables` (with optimizations provided by @deathaxe and regex patterns provided by @Thom1729) - Context names now more closely match the naming scheme of other (recently re-written) default syntaxes - (correctly formatted) JSON code can now be prettified or minified via the context menu or the command palette. JSON code can optionally be auto-prettified on pre save events. - highlight leading, trailing & multiple commas as invalid - only allow exactly one structure (object, array) or value (constant, number, string) at top level (thanks to @keith-hall) - links (`meta.link.inet`) and email addresses (`meta.link.email`) are scoped the same as in Markdown (thanks to @deathaxe) - JSONC: - highlight some files by default as `JSONC` (as decided by @jskinner in sublimehq#285) - highlight leading & multiple commas as invalid, trailing as valid - scope empty block comments as such - support syntax based folding of ST4131+, compare sublimehq#3291 - JSON5: - explicitly pos numbers, hexadecimal ints, Infinity and NaN - single quoted strings - more escape chars for strings - ECMA identifierName as object keys (regexes thanks to @Thom1729) - scoped as plain unquoted strings (thanks to @Thom1729) - support string interpolation (thanks to @deathaxe) - line continuation in strings (with tests thanks to @keith-hall) - JSON.NET: - support requested by @keith-hall, built with feedback from @michaelblyons - Objects: - Highlighting speed improvements for empty objects (thanks to @FichteFoll) - Make `mapping.*` contexts more modular - Arrays: - Highlighting speed improvements for empty arrays (thanks to @FichteFoll) - Numbers: - Correctly scope number signs with `constant.numeric.sign` instead of `keyword.operator.arithmetic` - Significantly improve number highlighting (thanks to @deathaxe) - Completions: - completions have been added for language constants, including kind info and details (with links to docs) - `null`, `false`, `true` for JSON - `Infinity` and `NaN` for JSON5 - Settings: - a `default_extension` is now set for all JSON flavors - Symbol index: - with an object structure at the top-level, only top-level keys within now show up in the index (including tests for symbols and syntax) - Tests: - test files now test the base scope - Significantly extend tests to cover more parts of the syntaxes - Split original test file into logical parts - Add indentation tests for: - `json`, `json5` & `jsonc` - `mapping` (objects), `sequence` (arrays) - Add symbols tests for: - top-level keys of object structures (thanks to deathaxe) - languages: `json`, `json5` & `jsonc` - Fix tests for `meta.mapping meta.mapping.*` - Leave `JSON` headers in `Markdown` as `json` only, but split up fenced code blocks into `json`, `json5` & `jsonc` to behave similarly to `GitHub Flavored Markdown` BREAKING CHANGES: - JSON does not have values that can be set via an inline calculation with the help of operators, but only simple number values. Scopes for number signs have changed from being `keyword.operator.arithmetic` to `constant.numeric.sign`. Color scheme authors should add this, should it be missing. - The `JSON.sublime-syntax` now marks comments as `invalid`, third party plugin authors should instead target `JSONC.sublime-syntax` to keep the user experience as-is. - Indexed symbols (i.e. top-level keys in JSON object structures) are scoped as `source.json meta.mapping.key - (meta.mapping.value meta.mapping.key | meta.sequence.list meta.mapping.key)`. Color scheme authors should add special highlighting to differentiate them from other keys. - fix sublimehq#285 - address sublimehq#421 (thanks to @FichteFoll) - address sublimehq#481 to remove incompatible regex patterns according to @wbond - address sublimehq#757 to fix line comments for `JSONC` (thanks to @keith-hall) - address sublimehq#2430 using sort-order (as requested by @deathaxe) - address sublimehq#2711 with regards to `constant.language.null` vs. `constant.language.empty` (thanks to @FichteFoll) - address sublimehq#2852 to fix scopes of curly braces & square brackets in `JSON` (thanks to @Thom1729) - address sublimehq#3228 to fix `punctuation.separator` scopes, compare sublimehq#3270 - address sublimehq/sublime_text#3154 and add symbol tests Co-authored-by: Ashwin Shenoy <Ultra-Instinct-05@users.noreply.github.com> Co-authored-by: Jack Cherng <jfcherng@users.noreply.github.com> Co-authored-by: Janos Wortmann <jwortmann@users.noreply.github.com> Co-authored-by: Jon Skinner <jps@sublimetext.com> Co-authored-by: FichteFoll <FichteFoll@users.noreply.github.com> Co-authored-by: Keith Hall <keith-hall@users.noreply.github.com> Co-authored-by: Michael B. Lyons <michaelblyons@users.noreply.github.com> Co-authored-by: Rafał Chłodnicki <rchl@users.noreply.github.com> Co-authored-by: Thomas Smith <Thom1729@users.noreply.github.com> Co-authored-by: Will Bond <wbond@users.noreply.github.com> Co-authored-by: deathaxe <deathaxe@users.noreply.github.com>

@deathaxe

- Using inheritance split up `JSON.sublime-syntax` into: - `JSON.sublime-syntax` with `scope:source.json` - `JSONC.sublime-syntax` with `scope:source.json.jsonc` - `JSON5.sublime-syntax` with `scope:source.json.json5` - `JSON_dotNET.sublime-syntax` with `scope:source.json.json-dotnet` - Add many more file extensions for `JSON` & `JSONC`: - add doc links to extensions where applicable as a reference to be able to more quickly verify that they (still) use said syntax flavor - JSON: - Make use of newer syntax features including those only available in `version: 2` syntaxes - Make use of `variables` (with optimizations provided by @deathaxe and regex patterns provided by @Thom1729) - Context names now more closely match the naming scheme of other (recently re-written) default syntaxes - (correctly formatted) JSON code can now be prettified or minified via the context menu or the command palette. JSON code can optionally be auto-prettified on pre save events. - highlight leading, trailing & multiple commas as invalid - only allow exactly one structure (object, array) or value (constant, number, string) at top level (thanks to @keith-hall) - links (`meta.link.inet`) and email addresses (`meta.link.email`) are scoped the same as in Markdown (thanks to @deathaxe) - JSONC: - highlight some files by default as `JSONC` (as decided by @jskinner in sublimehq#285) - highlight leading & multiple commas as invalid, trailing as valid - scope empty block comments as such - support syntax based folding of ST4131+, compare sublimehq#3291 - JSON5: - explicitly pos numbers, hexadecimal ints, Infinity and NaN - single quoted strings - more escape chars for strings - ECMA identifierName as object keys (regexes thanks to @Thom1729) - scoped as plain unquoted strings (thanks to @Thom1729) - support string interpolation (thanks to @deathaxe) - line continuation in strings (with tests thanks to @keith-hall) - JSON.NET: - support requested by @keith-hall, built with feedback from @michaelblyons - Objects: - Highlighting speed improvements for empty objects (thanks to @FichteFoll) - Make `mapping.*` contexts more modular - Arrays: - Highlighting speed improvements for empty arrays (thanks to @FichteFoll) - Numbers: - Correctly scope number signs with `constant.numeric.sign` instead of `keyword.operator.arithmetic` - Significantly improve number highlighting (thanks to @deathaxe) - Completions: - completions have been added for language constants, including kind info and details (with links to docs) - `null`, `false`, `true` for JSON - `Infinity` and `NaN` for JSON5 - Settings: - a `default_extension` is now set for all JSON flavors - Symbol index: - with an object structure at the top-level, only top-level keys within now show up in the index (including tests for symbols and syntax) - Tests: - test files now test the base scope - Significantly extend tests to cover more parts of the syntaxes - Split original test file into logical parts - Add indentation tests for: - `json`, `json5` & `jsonc` - `mapping` (objects), `sequence` (arrays) - Add symbols tests for: - top-level keys of object structures (thanks to deathaxe) - languages: `json`, `json5` & `jsonc` - Fix tests for `meta.mapping meta.mapping.*` - Leave `JSON` headers in `Markdown` as `json` only, but split up fenced code blocks into `json`, `json5` & `jsonc` to behave similarly to `GitHub Flavored Markdown` BREAKING CHANGES: - JSON does not have values that can be set via an inline calculation with the help of operators, but only simple number values. Scopes for number signs have changed from being `keyword.operator.arithmetic` to `constant.numeric.sign`. Color scheme authors should add this, should it be missing. - The `JSON.sublime-syntax` now marks comments as `invalid`, third party plugin authors should instead target `JSONC.sublime-syntax` to keep the user experience as-is. - Indexed symbols (i.e. top-level keys in JSON object structures) are scoped as `source.json meta.mapping.key - (meta.mapping.value meta.mapping.key | meta.sequence.list meta.mapping.key)`. Color scheme authors should add special highlighting to differentiate them from other keys. - fix sublimehq#285 - address sublimehq#421 (thanks to @FichteFoll) - address sublimehq#481 to remove incompatible regex patterns according to @wbond - address sublimehq#757 to fix line comments for `JSONC` (thanks to @keith-hall) - address sublimehq#2430 using sort-order (as requested by @deathaxe) - address sublimehq#2711 with regards to `constant.language.null` vs. `constant.language.empty` (thanks to @FichteFoll) - address sublimehq#2852 to fix scopes of curly braces & square brackets in `JSON` (thanks to @Thom1729) - address sublimehq#3228 to fix `punctuation.separator` scopes, compare sublimehq#3270 - address sublimehq/sublime_text#3154 and add symbol tests Co-authored-by: Ashwin Shenoy <Ultra-Instinct-05@users.noreply.github.com> Co-authored-by: Jack Cherng <jfcherng@users.noreply.github.com> Co-authored-by: Janos Wortmann <jwortmann@users.noreply.github.com> Co-authored-by: Jon Skinner <jps@sublimetext.com> Co-authored-by: FichteFoll <FichteFoll@users.noreply.github.com> Co-authored-by: Keith Hall <keith-hall@users.noreply.github.com> Co-authored-by: Michael B. Lyons <michaelblyons@users.noreply.github.com> Co-authored-by: Rafał Chłodnicki <rchl@users.noreply.github.com> Co-authored-by: Thomas Smith <Thom1729@users.noreply.github.com> Co-authored-by: Will Bond <wbond@users.noreply.github.com> Co-authored-by: deathaxe <deathaxe@users.noreply.github.com>

@deathaxe

- Using inheritance split up `JSON.sublime-syntax` into: - `JSON.sublime-syntax` with `scope:source.json` - `JSONC.sublime-syntax` with `scope:source.json.jsonc` - `JSON5.sublime-syntax` with `scope:source.json.json5` - `JSON_dotNET.sublime-syntax` with `scope:source.json.json-dotnet` - Add many more file extensions for `JSON` & `JSONC`: - add doc links to extensions where applicable as a reference to be able to more quickly verify that they (still) use said syntax flavor - JSON: - Make use of newer syntax features including those only available in `version: 2` syntaxes - Make use of `variables` (with optimizations provided by @deathaxe and regex patterns provided by @Thom1729) - Context names now more closely match the naming scheme of other (recently re-written) default syntaxes - (correctly formatted) JSON code can now be prettified or minified via the context menu or the command palette. JSON code can optionally be auto-prettified on pre save events. - highlight leading, trailing & multiple commas as invalid - only allow exactly one structure (object, array) or value (constant, number, string) at top level (thanks to @keith-hall) - links (`meta.link.inet`) and email addresses (`meta.link.email`) are scoped the same as in Markdown (thanks to @deathaxe) - JSONC: - highlight some files by default as `JSONC` (as decided by @jskinner in sublimehq#285) - highlight leading & multiple commas as invalid, trailing as valid - scope empty block comments as such - support syntax based folding of ST4131+, compare sublimehq#3291 - JSON5: - explicitly pos numbers, hexadecimal ints, Infinity and NaN - single quoted strings - more escape chars for strings - ECMA identifierName as object keys (regexes thanks to @Thom1729) - scoped as plain unquoted strings (thanks to @Thom1729) - support string interpolation (thanks to @deathaxe) - line continuation in strings (with tests thanks to @keith-hall) - JSON.NET: - support requested by @keith-hall, built with feedback from @michaelblyons - Objects: - Highlighting speed improvements for empty objects (thanks to @FichteFoll) - Make `mapping.*` contexts more modular - Arrays: - Highlighting speed improvements for empty arrays (thanks to @FichteFoll) - Numbers: - Correctly scope number signs with `constant.numeric.sign` instead of `keyword.operator.arithmetic` - Significantly improve number highlighting (thanks to @deathaxe) - Completions: - completions have been added for language constants, including kind info and details (with links to docs) - `null`, `false`, `true` for JSON - `Infinity` and `NaN` for JSON5 - Settings: - a `default_extension` is now set for all JSON flavors - Symbol index: - with an object structure at the top-level, only top-level keys within now show up in the index (including tests for symbols and syntax) - Tests: - test files now test the base scope - Significantly extend tests to cover more parts of the syntaxes - Split original test file into logical parts - Add indentation tests for: - `json`, `json5` & `jsonc` - `mapping` (objects), `sequence` (arrays) - Add symbols tests for: - top-level keys of object structures (thanks to deathaxe) - languages: `json`, `json5` & `jsonc` - Fix tests for `meta.mapping meta.mapping.*` - Leave `JSON` headers in `Markdown` as `json` only, but split up fenced code blocks into `json`, `json5` & `jsonc` to behave similarly to `GitHub Flavored Markdown` BREAKING CHANGES: - JSON does not have values that can be set via an inline calculation with the help of operators, but only simple number values. Scopes for number signs have changed from being `keyword.operator.arithmetic` to `constant.numeric.sign`. Color scheme authors should add this, should it be missing. - The `JSON.sublime-syntax` now marks comments as `invalid`, third party plugin authors should instead target `JSONC.sublime-syntax` to keep the user experience as-is. - Indexed symbols (i.e. top-level keys in JSON object structures) are scoped as `source.json meta.mapping.key - (meta.mapping.value meta.mapping.key | meta.sequence.list meta.mapping.key)`. Color scheme authors should add special highlighting to differentiate them from other keys. - fix sublimehq#285 - address sublimehq#421 (thanks to @FichteFoll) - address sublimehq#481 to remove incompatible regex patterns according to @wbond - address sublimehq#757 to fix line comments for `JSONC` (thanks to @keith-hall) - address sublimehq#2430 using sort-order (as requested by @deathaxe) - address sublimehq#2711 with regards to `constant.language.null` vs. `constant.language.empty` (thanks to @FichteFoll) - address sublimehq#2852 to fix scopes of curly braces & square brackets in `JSON` (thanks to @Thom1729) - address sublimehq#3228 to fix `punctuation.separator` scopes, compare sublimehq#3270 - address sublimehq/sublime_text#3154 and add symbol tests Co-authored-by: Ashwin Shenoy <Ultra-Instinct-05@users.noreply.github.com> Co-authored-by: Jack Cherng <jfcherng@users.noreply.github.com> Co-authored-by: Janos Wortmann <jwortmann@users.noreply.github.com> Co-authored-by: Jon Skinner <jps@sublimetext.com> Co-authored-by: FichteFoll <FichteFoll@users.noreply.github.com> Co-authored-by: Keith Hall <keith-hall@users.noreply.github.com> Co-authored-by: Michael B. Lyons <michaelblyons@users.noreply.github.com> Co-authored-by: Rafał Chłodnicki <rchl@users.noreply.github.com> Co-authored-by: Thomas Smith <Thom1729@users.noreply.github.com> Co-authored-by: Will Bond <wbond@users.noreply.github.com> Co-authored-by: deathaxe <deathaxe@users.noreply.github.com>

@deathaxe

- Using inheritance split up `JSON.sublime-syntax` into: - `JSON.sublime-syntax` with `scope:source.json` - `JSONC.sublime-syntax` with `scope:source.json.jsonc` - `JSON5.sublime-syntax` with `scope:source.json.json5` - `JSON_dotNET.sublime-syntax` with `scope:source.json.json-dotnet` - Add many more file extensions for `JSON` & `JSONC`: - add doc links to extensions where applicable as a reference to be able to more quickly verify that they (still) use said syntax flavor - JSON: - Make use of newer syntax features including those only available in `version: 2` syntaxes - Make use of `variables` (with optimizations provided by @deathaxe and regex patterns provided by @Thom1729) - Context names now more closely match the naming scheme of other (recently re-written) default syntaxes - (correctly formatted) JSON code can now be prettified or minified via the context menu or the command palette. JSON code can optionally be auto-prettified on pre save events. - highlight leading, trailing & multiple commas as invalid - only allow exactly one structure (object, array) or value (constant, number, string) at top level (thanks to @keith-hall) - links (`meta.link.inet`) and email addresses (`meta.link.email`) are scoped the same as in Markdown (thanks to @deathaxe) - JSONC: - highlight some files by default as `JSONC` (as decided by @jskinner in sublimehq#285) - highlight leading & multiple commas as invalid, trailing as valid - scope empty block comments as such - support syntax based folding of ST4131+, compare sublimehq#3291 - JSON5: - explicitly pos numbers, hexadecimal ints, Infinity and NaN - single quoted strings - more escape chars for strings - ECMA identifierName as object keys (regexes thanks to @Thom1729) - scoped as plain unquoted strings (thanks to @Thom1729) - support string interpolation (thanks to @deathaxe) - line continuation in strings (with tests thanks to @keith-hall) - JSON.NET: - support requested by @keith-hall, built with feedback from @michaelblyons - Objects: - Highlighting speed improvements for empty objects (thanks to @FichteFoll) - Make `mapping.*` contexts more modular - Arrays: - Highlighting speed improvements for empty arrays (thanks to @FichteFoll) - Numbers: - Correctly scope number signs with `constant.numeric.sign` instead of `keyword.operator.arithmetic` - Significantly improve number highlighting (thanks to @deathaxe) - Completions: - completions have been added for language constants, including kind info and details (with links to docs) - `null`, `false`, `true` for JSON - `Infinity` and `NaN` for JSON5 - Settings: - a `default_extension` is now set for all JSON flavors - Symbol index: - with an object structure at the top-level, only top-level keys within now show up in the index (including tests for symbols and syntax) - Tests: - test files now test the base scope - Significantly extend tests to cover more parts of the syntaxes - Split original test file into logical parts - Add indentation tests for: - `json`, `json5` & `jsonc` - `mapping` (objects), `sequence` (arrays) - Add symbols tests for: - top-level keys of object structures (thanks to deathaxe) - languages: `json`, `json5` & `jsonc` - Fix tests for `meta.mapping meta.mapping.*` - Leave `JSON` headers in `Markdown` as `json` only, but split up fenced code blocks into `json`, `json5` & `jsonc` to behave similarly to `GitHub Flavored Markdown` BREAKING CHANGES: - JSON does not have values that can be set via an inline calculation with the help of operators, but only simple number values. Scopes for number signs have changed from being `keyword.operator.arithmetic` to `constant.numeric.sign`. Color scheme authors should add this, should it be missing. - The `JSON.sublime-syntax` now marks comments as `invalid`, third party plugin authors should instead target `JSONC.sublime-syntax` to keep the user experience as-is. - Indexed symbols (i.e. top-level keys in JSON object structures) are scoped as `source.json meta.mapping.key - (meta.mapping.value meta.mapping.key | meta.sequence.list meta.mapping.key)`. Color scheme authors should add special highlighting to differentiate them from other keys. - fix sublimehq#285 - address sublimehq#421 (thanks to @FichteFoll) - address sublimehq#481 to remove incompatible regex patterns according to @wbond - address sublimehq#757 to fix line comments for `JSONC` (thanks to @keith-hall) - address sublimehq#2430 using sort-order (as requested by @deathaxe) - address sublimehq#2711 with regards to `constant.language.null` vs. `constant.language.empty` (thanks to @FichteFoll) - address sublimehq#2852 to fix scopes of curly braces & square brackets in `JSON` (thanks to @Thom1729) - address sublimehq#3228 to fix `punctuation.separator` scopes, compare sublimehq#3270 - address sublimehq/sublime_text#3154 and add symbol tests Co-authored-by: Ashwin Shenoy <Ultra-Instinct-05@users.noreply.github.com> Co-authored-by: Jack Cherng <jfcherng@users.noreply.github.com> Co-authored-by: Janos Wortmann <jwortmann@users.noreply.github.com> Co-authored-by: Jon Skinner <jps@sublimetext.com> Co-authored-by: FichteFoll <FichteFoll@users.noreply.github.com> Co-authored-by: Keith Hall <keith-hall@users.noreply.github.com> Co-authored-by: Michael B. Lyons <michaelblyons@users.noreply.github.com> Co-authored-by: Rafał Chłodnicki <rchl@users.noreply.github.com> Co-authored-by: Thomas Smith <Thom1729@users.noreply.github.com> Co-authored-by: Will Bond <wbond@users.noreply.github.com> Co-authored-by: deathaxe <deathaxe@users.noreply.github.com>

wbond added the RFC label Dec 23, 2016

wbond mentioned this issue Dec 23, 2016

[Scala] Various stateful bailout bugs #756

Merged

keith-hall mentioned this issue Sep 4, 2017

[Git Files] Add Git Files package #1126

Merged

keith-hall mentioned this issue Apr 28, 2018

Change loop detection to work like Sublime Text trishume/syntect#146

Merged

tajmone mentioned this issue Aug 14, 2018

Syntax Highlighting Breaks With _ asciidoctor/sublimetext-asciidoc#14

Open

deathaxe mentioned this issue Feb 8, 2019

[D] Syntax Improvements #1850

Merged

michaelblyons mentioned this issue Feb 17, 2020

[Lisp] Highlighting seems very gone #1968

Closed

ISSOtm added a commit to ISSOtm/sublime-RGBDS that referenced this issue Aug 10, 2020

Rewrite syntax to adhere to better standards

1bfcaa0

Basically gleaned from sublimehq/Packages#757

keith-hall mentioned this issue Aug 3, 2021

Recognize JSON and XML by media type suffixes keith-hall/http-request-response-syntax#3

Merged

keith-hall mentioned this issue Aug 17, 2021

[C#] Support tuples whose commas are not on the same line as the open paren #2838

Merged

jrappen mentioned this issue Oct 24, 2021

[JSON] Rewrite syntax #3097

Draft

jrappen mentioned this issue Nov 11, 2021

[Haskell] Add Cabal Syntax #2682

Merged

jrappen mentioned this issue Dec 7, 2021

Some JSON reworks jrappen/sublime-json#8

Merged

keith-hall mentioned this issue Feb 13, 2022

[SQL] use inheritance to support different dialects #3046

Open

michaelblyons pinned this issue Apr 28, 2022

keith-hall mentioned this issue Apr 30, 2023

how do I write my own syntax? trishume/syntect#475

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntax Development Tips/Advice #757

Syntax Development Tips/Advice #757

wbond commented Dec 23, 2016

wbond commented Dec 23, 2016

djspiewak commented Dec 23, 2016 •

edited

Loading

Thom1729 commented Mar 16, 2017 •

edited

Loading

Thom1729 commented Mar 17, 2017

keith-hall commented Sep 4, 2017

tajmone commented Apr 25, 2018 •

edited

Loading

tajmone commented Apr 27, 2018

tajmone commented Apr 27, 2018 •

edited

Loading

keith-hall commented Apr 28, 2018

tajmone commented Apr 28, 2018

keith-hall commented Oct 19, 2019

FichteFoll commented Oct 19, 2019 •

edited

Loading

tajmone commented Oct 19, 2019

FichteFoll commented Oct 19, 2019 •

edited

Loading

rchl commented Oct 21, 2020 •

edited

Loading

FichteFoll commented Oct 24, 2020 •

edited

Loading

jrappen commented Jan 5, 2021

tajmone commented Jan 5, 2021

jrappen commented Jan 5, 2021

FichteFoll commented Jan 10, 2021

deathaxe commented Aug 18, 2021 •

edited

Loading

deathaxe commented Aug 18, 2021

deathaxe commented Aug 18, 2021

Syntax Development Tips/Advice #757

Syntax Development Tips/Advice #757

Comments

wbond commented Dec 23, 2016

wbond commented Dec 23, 2016

Check for Scope Doubling

djspiewak commented Dec 23, 2016 • edited Loading

Stateful Chaining

Don't Over-Use

Push Your First State

Lookahead Push for Meta-Scoping

Bail Outs

Test Partially-Valid Buffers

Organize

Thom1729 commented Mar 16, 2017 • edited Loading

Use the Stack

Thom1729 commented Mar 17, 2017

Preprocessing with YAML Macros

keith-hall commented Sep 4, 2017

Keep matches concise

tajmone commented Apr 25, 2018 • edited Loading

Unusual Syntaxes and Their Pitfalls

Forceful Popping

Watchlist of Common Newbie-Mistakes

A Syntax CLI Simulator/Debugger Would Be Invaluable

Syntect: A Fallback Debugger

tajmone commented Apr 27, 2018

Stack Popping Tricks

Force POP

Else POP

End-of-Line POP

tajmone commented Apr 27, 2018 • edited Loading

Syntax Test Files

Testing Against Scope Spillings

Testing Meta Scope Exiting

keith-hall commented Apr 28, 2018

tajmone commented Apr 28, 2018

keith-hall commented Oct 19, 2019

Changing Regex Mode in Variables

Making Variables Atomic

FichteFoll commented Oct 19, 2019 • edited Loading

Use the chomping indicator in block scalars

tajmone commented Oct 19, 2019

Syntax Tests: Use of & Operator in Scopes

FichteFoll commented Oct 19, 2019 • edited Loading

rchl commented Oct 21, 2020 • edited Loading

Rule loop processing order within a context

Rule handling around the beginning/end of the line

FichteFoll commented Oct 24, 2020 • edited Loading

jrappen commented Jan 5, 2021

tajmone commented Jan 5, 2021

jrappen commented Jan 5, 2021

FichteFoll commented Jan 10, 2021

deathaxe commented Aug 18, 2021 • edited Loading

deathaxe commented Aug 18, 2021

deathaxe commented Aug 18, 2021

djspiewak commented Dec 23, 2016 •

edited

Loading

Thom1729 commented Mar 16, 2017 •

edited

Loading

tajmone commented Apr 25, 2018 •

edited

Loading

tajmone commented Apr 27, 2018 •

edited

Loading

FichteFoll commented Oct 19, 2019 •

edited

Loading

FichteFoll commented Oct 19, 2019 •

edited

Loading

rchl commented Oct 21, 2020 •

edited

Loading

FichteFoll commented Oct 24, 2020 •

edited

Loading

deathaxe commented Aug 18, 2021 •

edited

Loading