Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[grammars] provide alternative to TextMate grammars #216

Open
tomq42 opened this issue Nov 19, 2015 · 11 comments
Open

[grammars] provide alternative to TextMate grammars #216

tomq42 opened this issue Nov 19, 2015 · 11 comments
Assignees
Labels
api feature-request Request for new features or functionality languages-basic Basic language support issues
Milestone

Comments

@tomq42
Copy link

tomq42 commented Nov 19, 2015

TextMate isn't sufficient for many languages.

We have been integrating in to the lower level, in the src/vs/languages directory and using Modes.IState and supports.TokenisationSupport. There needs to be a way of writing an extension that can do this, which at least currently there doesn't seem to be,

Thanks.

@jrieken jrieken added the feature-request Request for new features or functionality label Nov 24, 2015
@egamma egamma modified the milestone: Backlog Dec 10, 2015
@Tyriar Tyriar added the languages-basic Basic language support issues label May 28, 2016
@Tyriar
Copy link
Member

Tyriar commented May 28, 2016

@aeschli I'm interested in the thinking behind moving away from our own tokenization in favor of tmLanguages.

@aeschli
Copy link
Contributor

aeschli commented May 30, 2016

@Tyriar For performance reasons we want the tokenizers to run in the render process. As we don't want user code to run in the render process we went for declarative tokenizers.
First we had Monarch support as well, but before the API deadline decided to go fully for text mate to keep the API simple.
Also, in order for our theming support to work well, we want all tokenizers to emit TextMate tokens.

We are aware of the limitations and problems of TextMate, and we are open to allow other types of tokenizers, but no work is planned in this area at the moment.

@tomq42
Copy link
Author

tomq42 commented May 31, 2016

That's a shame. It's also slightly "unfair". In the sense that it means that Microsoft can write language modes that can do things that other people can't....

There would be zero chance of a pull request to the core of vscode being accepted for a new language mode, so we are left being unable to write a sensible language mode for vscode. We have no problem in Eclipse, there's no problem in writing similar things there, they have no issue with things running in the render thread.

At least in Monarch there was a "pop state" facility, which as far as I know has no equivalent in TextMate. In Monarch you could shift states to an explicit state, and then "pop". So you could write "subroutines". The facility made it least possible to write our language mode in Monarch, even if it was much harder work than doing it the low level way, which is what we ended up doing.

@tambry
Copy link

tambry commented Jan 4, 2017

Tokenizing languages where a single token might be split onto multiple lines is near impossible (without very complicated workarounds) using TextMate. (see microsoft/vscode-textmate#32)
Monarch would be a huge improvement, allowing better language support for more complicated/nuanced languages. Though being able to write a tokenizer using an API would be even better.

@EvgeniyPeshkov
Copy link

Hello everyone. I've developed and published syntax highlighting extension based on Tree-Sitter. It provides universal syntax coloring engine for almost any programming language (currently, C and C++ are supported OOTB). Constructing entire syntax tree, Tree-sitter efficiently overcomes all limitations of built-in TextMate grammars. It's very easy to add support for a new language. I'm planning to write HowTo in the next couple of days, but you can figure it out from source code, that is very simple and straightforward. Contributions are welcome. I've been using it by myself for a month, so I suppose it's ready for public use. At least extension can be useful until VSCode core provides stronger syntax parser.

You can install it from VSCode Marketplace.
Or download .vsix package from GitHub page and install it manually.
Please note, that extension published in VS Code Marketplace will only work in Windows-x64.
For other operating systems, please download pre-compiled .vsix package.
This will be fixed in the near future with one of the next updates.
Alternatively, you can build extension from sources.

@aeschli aeschli changed the title Need an official way of writing grammars for languages too complex for TextMate [grammars] provide alternative to TextMate grammars Oct 9, 2019
@texastoland
Copy link

texastoland commented Sep 6, 2021

We are aware of the limitations and problems of TextMate, and we are open to allow other types of tokenizers, but no work is planned in this area at the moment.

@aeschli This was 5½ years ago 👆🏼 I understand TextMate is probably as much as a thorn in your side as for extension authors judging by some of the @mjbvz's logged issues. I'm happy to document its pain points but I imagine you already have a query in your GitHub Issues Notebooks somewhere.

Here's my understanding of the current state of things:

After a week of struggling with microsoft/vscode-textmate#32 and practically nonexistent documentation apart from a blog post from 2014 ... could we pretty please with a cherry on top have an update on this issue?

@texastoland
Copy link

texastoland commented Sep 16, 2021

Continued from microsoft/vscode-textmate#117 (comment):

[@jeff-hykin] I spent years on a library (which I finally published just last week) to make it way less painful.

If it works for you that's great. For me most of your use case is solved by using YAML instead of JSON (like Sublime but it's frustrating that there's a compile step for Code) and the metaprogramming facilities of embedding match content in scope names (using them like CSS classes to inject other grammars) or YAML 1.1 merge keys.

How YAML looks (syntax highlighting available for embedded regexes):

scopeName: inline.template-fsharp-highlight.reinjection
injectionSelector: "L:meta.embedded"
patterns:
  - name: string.quoted.triple.fsharp.template.fsharp.substitution
    contentName: meta.template.expression.fsharp
    begin: |
      (?x)    # Ignore whitespace
      (?<!\{) # Not after brace
      \{      # Literal brace
      (?!\{)  # Not before brace
    end: |
      (?x)
      (?<!})
      }
      (?!})
    captures:
      0: { name: keyword.symbol.fsharp }
    patterns:
      - include: source.fsharp

I've seen at least 2 projects that rolled their own grammar generators (the original Reason syntax and your own Better Shell Syntax). There's even a more interesting compiler (currently with documentation, online REPL, and CLI but no extension yet) to transpile an entirely new syntax with a Sublime-like stacking context into TextMate.

But to me it's all infuriating. Code is progressive in so many ways but not only regressive in a core component of literally any text editor but now unresponsive about it. The https://github.com/microsoft/vscode-textmate project is on 1 hand daunting and on the other janky and indiscernible whether it's due to TextMate's unspecified behavior or actually a bug.

Semantic tokens were a foundational step but not a solution. Most of the implementations connect them to their LSP. That's less performant than using Tree-sitter (#50140), leaves the burden on extension authors to provide a TextMate grammar when the LSP isn't available (like for a file outside a .NET project), and creates an inconsistent experience for end users in terms of coloring (whitespace significant languages like F# are most drastically affected) as well as responsiveness.

In conclusion being silent about this hurts:

  1. performance of arguably 1 of the primary functionalities of Code.
  2. extension authors who waste weeks (🙋🏼‍♂️) reinventing the wheel because of an undocumented, outdated, insufficient, and slightly buggy (although well-tested) tool.
  3. end user experience (see previous paragraph).

@bpasero @egamma Sorry to spam you but would a separate PR proposal for #50140 be more productive 🙏🏼

@jasonwilliams
Copy link
Contributor

jasonwilliams commented Apr 2, 2022

@aeschli do you know if there’s any current exploration into something like treesitter as a replacement for the textmate grammars we have today? It can’t just be left as it is indefinitely as it’s noticeable.

But to me it's all infuriating. Code is progressive in so many ways but not only regressive in a core component of literally any text editor but now unresponsive about it.

performance of arguably 1 of the primary functionalities of Code.

This is very true and it’s actually quite sad to see too.

It’s great VSCode has all of these fancy bells and whistles and more features than you can possibly need, but it seems to get the basics wrong when it comes to rendering the source code onto the screen. On typescript projects I see the syntax highlighting kick in a few second after the code shows up, this is a known issue but was given lower priority. I’d probably go as far to say I’d happily wave any new feature for a few months if it meant time was spent on this.

I understand there’s also a desire to fully rely on LSP for code colouring, but this just adds extra latency like @texastoland mentioned above; you would definitely need some level of caching or stale-while-revalidate before falling back to LSP otherwise it’s no better than what we have today.

I understand anything around tokenisation requires a refactor and that’s most likely why no one wants to go near it but how long can that last really? Until competition begins to narrow?

Any additional tokenizer is waiting on @alexdima's #77140.

his last response to that thread was almost 3 years ago so I think it’s a dead end. It’s yet another thread where the maintainers have gone silent on the issue.

I did include tree-sitter in my post around VSCode performance as a whole https://jason-williams.co.uk/speeding-up-vscode-extensions-in-2022

@jasonwilliams
Copy link
Contributor

I've had a go at integrating a different service (alongside the textmate one) which supports tree-sitter. So far it loads up fine but there's some issues having it properly instantiate tree-sitter. I think this is to do with the security policies in place.

I think its possible to have a Tree Sitter Service which can emit tokens (similar to the textmate service) and have higher-up services use that instead. Or have them use the tree sitter API wrapped in a service (for queries etc)

If anyone is interesting in helping there's a PR here:
#147648

@texastoland
Copy link

texastoland commented Feb 24, 2024

Until competition begins to narrow?

Switching to Zed today 🤦🏼‍♂️

Note: not a single reply from MS here and only 1 dismissive response in #50140 (comment)

@heartacker
Copy link
Contributor

#161479 SAD

nang-dev pushed a commit to trypear/pearai-app that referenced this issue Oct 3, 2024
* Updating CI/CD

* Updating CI/CD and including SM

* Updating Electron

* Updating Actions

* More Updates

* More Updates

* Updating CI/CD commenting out anything related to Electron

* Updating CI/CD

* Linux changes

* Updating CI/CD
nang-dev pushed a commit to trypear/pearai-app that referenced this issue Oct 3, 2024
* Updating CI/CD

* Updating CI/CD and including SM

* Updating Electron

* Updating Actions

* More Updates

* More Updates

* Updating CI/CD commenting out anything related to Electron

* Updating CI/CD

* Linux changes

* Updating CI/CD
nang-dev pushed a commit to trypear/pearai-app that referenced this issue Oct 3, 2024
* Updating CI/CD

* Updating CI/CD and including SM

* Updating Electron

* Updating Actions

* More Updates

* More Updates

* Updating CI/CD commenting out anything related to Electron

* Updating CI/CD

* Linux changes

* Updating CI/CD
nang-dev pushed a commit to trypear/pearai-app that referenced this issue Oct 3, 2024
* Updating CI/CD

* Updating CI/CD and including SM

* Updating Electron

* Updating Actions

* More Updates

* More Updates

* Updating CI/CD commenting out anything related to Electron

* Updating CI/CD

* Linux changes

* Updating CI/CD
nang-dev pushed a commit to trypear/pearai-app that referenced this issue Oct 3, 2024
* Updating CI/CD

* Updating CI/CD and including SM

* Updating Electron

* Updating Actions

* More Updates

* More Updates

* Updating CI/CD commenting out anything related to Electron

* Updating CI/CD

* Linux changes

* Updating CI/CD
nang-dev pushed a commit to trypear/pearai-app that referenced this issue Oct 4, 2024
* Updating CI/CD

* Updating CI/CD and including SM

* Updating Electron

* Updating Actions

* More Updates

* More Updates

* Updating CI/CD commenting out anything related to Electron

* Updating CI/CD

* Linux changes

* Updating CI/CD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api feature-request Request for new features or functionality languages-basic Basic language support issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.