[grammars] provide alternative to TextMate grammars #216

tomq42 · 2015-11-19T13:33:28Z

TextMate isn't sufficient for many languages.

We have been integrating in to the lower level, in the src/vs/languages directory and using Modes.IState and supports.TokenisationSupport. There needs to be a way of writing an extension that can do this, which at least currently there doesn't seem to be,

Thanks.

Tyriar · 2016-05-28T01:46:36Z

@aeschli I'm interested in the thinking behind moving away from our own tokenization in favor of tmLanguages.

aeschli · 2016-05-30T14:16:38Z

@Tyriar For performance reasons we want the tokenizers to run in the render process. As we don't want user code to run in the render process we went for declarative tokenizers.
First we had Monarch support as well, but before the API deadline decided to go fully for text mate to keep the API simple.
Also, in order for our theming support to work well, we want all tokenizers to emit TextMate tokens.

We are aware of the limitations and problems of TextMate, and we are open to allow other types of tokenizers, but no work is planned in this area at the moment.

tomq42 · 2016-05-31T06:59:39Z

That's a shame. It's also slightly "unfair". In the sense that it means that Microsoft can write language modes that can do things that other people can't....

There would be zero chance of a pull request to the core of vscode being accepted for a new language mode, so we are left being unable to write a sensible language mode for vscode. We have no problem in Eclipse, there's no problem in writing similar things there, they have no issue with things running in the render thread.

At least in Monarch there was a "pop state" facility, which as far as I know has no equivalent in TextMate. In Monarch you could shift states to an explicit state, and then "pop". So you could write "subroutines". The facility made it least possible to write our language mode in Monarch, even if it was much harder work than doing it the low level way, which is what we ended up doing.

tambry · 2017-01-04T07:48:02Z

Tokenizing languages where a single token might be split onto multiple lines is near impossible (without very complicated workarounds) using TextMate. (see microsoft/vscode-textmate#32)
Monarch would be a huge improvement, allowing better language support for more complicated/nuanced languages. Though being able to write a tokenizer using an API would be even better.

EvgeniyPeshkov · 2019-05-22T13:56:48Z

Hello everyone. I've developed and published syntax highlighting extension based on Tree-Sitter. It provides universal syntax coloring engine for almost any programming language (currently, C and C++ are supported OOTB). Constructing entire syntax tree, Tree-sitter efficiently overcomes all limitations of built-in TextMate grammars. It's very easy to add support for a new language. I'm planning to write HowTo in the next couple of days, but you can figure it out from source code, that is very simple and straightforward. Contributions are welcome. I've been using it by myself for a month, so I suppose it's ready for public use. At least extension can be useful until VSCode core provides stronger syntax parser.

You can install it from VSCode Marketplace.
Or download .vsix package from GitHub page and install it manually.
Please note, that extension published in VS Code Marketplace will only work in Windows-x64.
For other operating systems, please download pre-compiled .vsix package.
This will be fixed in the near future with one of the next updates.
Alternatively, you can build extension from sources.

texastoland · 2021-09-06T22:19:14Z

We are aware of the limitations and problems of TextMate, and we are open to allow other types of tokenizers, but no work is planned in this area at the moment.

@aeschli This was 5½ years ago 👆🏼 I understand TextMate is probably as much as a thorn in your side as for extension authors judging by some of the @mjbvz's logged issues. I'm happy to document its pain points but I imagine you already have a query in your GitHub Issues Notebooks somewhere.

Here's my understanding of the current state of things:

The semantics token provider API lays the groundwork for mapping low-level tokens to high-level scopes for theming.
Any additional tokenizer is waiting on @alexdima's Tokenization overhaul #77140.
Tree-sitter (used in Atom and proposed in Support syntax highlighting with tree-sitter #50140) could provide both a declarative and higher-level imperative API to the lower-level semantics token provider API. I assume that's how the Syntax Highlighter extension works since EvgeniyPeshkov/syntax-highlighter@cadcfb9.
Sublime syntax (proposed in VSCode do not support .sublime-syntax which is more powerful than .tmLanguage #5408) could resolve some of TextMate's limitations but with a similar support burden. Microsoft would be beholden to emulate rather than evolve it.
Monaco's own Monarch looks like (without trying it) a conceptual hybrid between Tree-sitter and Sublime.

After a week of struggling with microsoft/vscode-textmate#32 and practically nonexistent documentation apart from a blog post from 2014 ... could we pretty please with a cherry on top have an update on this issue?

texastoland · 2021-09-16T16:24:18Z

Continued from microsoft/vscode-textmate#117 (comment):

[@jeff-hykin] I spent years on a library (which I finally published just last week) to make it way less painful.

If it works for you that's great. For me most of your use case is solved by using YAML instead of JSON (like Sublime but it's frustrating that there's a compile step for Code) and the metaprogramming facilities of embedding match content in scope names (using them like CSS classes to inject other grammars) or YAML 1.1 merge keys.

How YAML looks (syntax highlighting available for embedded regexes):

scopeName: inline.template-fsharp-highlight.reinjection
injectionSelector: "L:meta.embedded"
patterns:
  - name: string.quoted.triple.fsharp.template.fsharp.substitution
    contentName: meta.template.expression.fsharp
    begin: |
      (?x)    # Ignore whitespace
      (?<!\{) # Not after brace
      \{      # Literal brace
      (?!\{)  # Not before brace
    end: |
      (?x)
      (?<!})
      }
      (?!})
    captures:
      0: { name: keyword.symbol.fsharp }
    patterns:
      - include: source.fsharp

I've seen at least 2 projects that rolled their own grammar generators (the original Reason syntax and your own Better Shell Syntax). There's even a more interesting compiler (currently with documentation, online REPL, and CLI but no extension yet) to transpile an entirely new syntax with a Sublime-like stacking context into TextMate.

But to me it's all infuriating. Code is progressive in so many ways but not only regressive in a core component of literally any text editor but now unresponsive about it. The https://github.com/microsoft/vscode-textmate project is on 1 hand daunting and on the other janky and indiscernible whether it's due to TextMate's unspecified behavior or actually a bug.

Semantic tokens were a foundational step but not a solution. Most of the implementations connect them to their LSP. That's less performant than using Tree-sitter (#50140), leaves the burden on extension authors to provide a TextMate grammar when the LSP isn't available (like for a file outside a .NET project), and creates an inconsistent experience for end users in terms of coloring (whitespace significant languages like F# are most drastically affected) as well as responsiveness.

In conclusion being silent about this hurts:

performance of arguably 1 of the primary functionalities of Code.
extension authors who waste weeks (🙋🏼‍♂️) reinventing the wheel because of an undocumented, outdated, insufficient, and slightly buggy (although well-tested) tool.
end user experience (see previous paragraph).

@bpasero @egamma Sorry to spam you but would a separate PR proposal for #50140 be more productive 🙏🏼

jasonwilliams · 2022-04-02T16:20:00Z

@aeschli do you know if there’s any current exploration into something like treesitter as a replacement for the textmate grammars we have today? It can’t just be left as it is indefinitely as it’s noticeable.

But to me it's all infuriating. Code is progressive in so many ways but not only regressive in a core component of literally any text editor but now unresponsive about it.

performance of arguably 1 of the primary functionalities of Code.

This is very true and it’s actually quite sad to see too.

It’s great VSCode has all of these fancy bells and whistles and more features than you can possibly need, but it seems to get the basics wrong when it comes to rendering the source code onto the screen. On typescript projects I see the syntax highlighting kick in a few second after the code shows up, this is a known issue but was given lower priority. I’d probably go as far to say I’d happily wave any new feature for a few months if it meant time was spent on this.

I understand there’s also a desire to fully rely on LSP for code colouring, but this just adds extra latency like @texastoland mentioned above; you would definitely need some level of caching or stale-while-revalidate before falling back to LSP otherwise it’s no better than what we have today.

I understand anything around tokenisation requires a refactor and that’s most likely why no one wants to go near it but how long can that last really? Until competition begins to narrow?

Any additional tokenizer is waiting on @alexdima's #77140.

his last response to that thread was almost 3 years ago so I think it’s a dead end. It’s yet another thread where the maintainers have gone silent on the issue.

I did include tree-sitter in my post around VSCode performance as a whole https://jason-williams.co.uk/speeding-up-vscode-extensions-in-2022

jasonwilliams · 2022-04-18T20:32:39Z

I've had a go at integrating a different service (alongside the textmate one) which supports tree-sitter. So far it loads up fine but there's some issues having it properly instantiate tree-sitter. I think this is to do with the security policies in place.

I think its possible to have a Tree Sitter Service which can emit tokens (similar to the textmate service) and have higher-up services use that instead. Or have them use the tree sitter API wrapped in a service (for queries etc)

If anyone is interesting in helping there's a PR here:
#147648

texastoland · 2024-02-24T00:18:11Z

Until competition begins to narrow?

Switching to Zed today 🤦🏼‍♂️

Note: not a single reply from MS here and only 1 dismissive response in #50140 (comment)

heartacker · 2024-02-24T00:36:32Z

#161479 SAD

* Updating CI/CD * Updating CI/CD and including SM * Updating Electron * Updating Actions * More Updates * More Updates * Updating CI/CD commenting out anything related to Electron * Updating CI/CD * Linux changes * Updating CI/CD

jrieken added the feature-request Request for new features or functionality label Nov 24, 2015

egamma modified the milestone: Backlog Dec 10, 2015

Tyriar assigned aeschli May 28, 2016

Tyriar added the languages-basic Basic language support issues label May 28, 2016

Tyriar mentioned this issue Sep 5, 2016

lock scroll in integrated terminal #11479

Closed

tambry mentioned this issue Jan 4, 2017

TextMate grammar regexes able to only match a single line microsoft/vscode-textmate#32

Closed

tambry mentioned this issue Jan 7, 2017

CMake language server microsoft/vscode-cmake-tools#93

Closed

mjbvz added the api label Feb 1, 2017

aeschli mentioned this issue Nov 23, 2017

[typescript] use the language service for tokenization #11580

Closed

alexr00 mentioned this issue Feb 8, 2019

C++ custom class name is not highlighted #68099

Closed

aeschli changed the title ~~Need an official way of writing grammars for languages too complex for TextMate~~ [grammars] provide alternative to TextMate grammars Oct 9, 2019

tomoyukim mentioned this issue May 2, 2020

Syntax Highlighting tomoyukim/vscode-mermaid-editor#10

Closed

mhagmajer mentioned this issue Jun 20, 2020

Create an AskScript Playground from scratch CatchTheTornado/askql#182

Closed

texastoland mentioned this issue Sep 15, 2021

Behavior of $base and $self in injection grammars microsoft/vscode-textmate#117

Open

kieferrm mentioned this issue Jan 4, 2022

Iteration Plan for January 2022 #139607

Closed

91 tasks

Yash-Singh1 mentioned this issue Apr 14, 2022

Export for VS Code Usage Yash-Singh1/monaco-mermaid#15

Open

jasonwilliams mentioned this issue Apr 18, 2022

[Draft] - Tree Sitter Service (POC) #147648

Closed

Trebor-Huang mentioned this issue Jun 29, 2022

Syntax Highlight Trebor-Huang/vscode-btex#1

Closed

alexr00 mentioned this issue Aug 9, 2022

VSCode do not support .sublime-syntax which is more powerful than .tmLanguage #5408

Closed

aghArdeshir mentioned this issue Jan 23, 2024

TextMate is not being updated anymre #203212

Closed

texastoland mentioned this issue Feb 28, 2024

Support syntax highlighting with tree-sitter #50140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[grammars] provide alternative to TextMate grammars #216

[grammars] provide alternative to TextMate grammars #216

tomq42 commented Nov 19, 2015

Tyriar commented May 28, 2016

aeschli commented May 30, 2016 •

edited

Loading

tomq42 commented May 31, 2016

tambry commented Jan 4, 2017

EvgeniyPeshkov commented May 22, 2019

texastoland commented Sep 6, 2021 •

edited

Loading

texastoland commented Sep 16, 2021 •

edited

Loading

jasonwilliams commented Apr 2, 2022 •

edited

Loading

jasonwilliams commented Apr 18, 2022

texastoland commented Feb 24, 2024 •

edited

Loading

heartacker commented Feb 24, 2024

[grammars] provide alternative to TextMate grammars #216

[grammars] provide alternative to TextMate grammars #216

Comments

tomq42 commented Nov 19, 2015

Tyriar commented May 28, 2016

aeschli commented May 30, 2016 • edited Loading

tomq42 commented May 31, 2016

tambry commented Jan 4, 2017

EvgeniyPeshkov commented May 22, 2019

texastoland commented Sep 6, 2021 • edited Loading

texastoland commented Sep 16, 2021 • edited Loading

jasonwilliams commented Apr 2, 2022 • edited Loading

jasonwilliams commented Apr 18, 2022

texastoland commented Feb 24, 2024 • edited Loading

heartacker commented Feb 24, 2024

aeschli commented May 30, 2016 •

edited

Loading

texastoland commented Sep 6, 2021 •

edited

Loading

texastoland commented Sep 16, 2021 •

edited

Loading

jasonwilliams commented Apr 2, 2022 •

edited

Loading

texastoland commented Feb 24, 2024 •

edited

Loading