Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for semantic tokens #615

Open
leungbk opened this issue Jan 31, 2021 · 33 comments · May be fixed by #839
Open

Support for semantic tokens #615

leungbk opened this issue Jan 31, 2021 · 33 comments · May be fixed by #839

Comments

@leungbk
Copy link
Contributor

leungbk commented Jan 31, 2021

https://microsoft.github.io/language-server-protocol/specifications/specification-3-16/#textDocument_semanticTokens

Introduced in 3.16, and provided by the clangd and rust-analyzer servers. It would be nice if Eglot supported this feature.

@joaotavora
Copy link
Owner

In your opinion, what should Eglot do with this? Fontify? these things are normally handled by the major mode already, though maybe it can be added to Eglot's responsibilities, not it's not clear how just yet.

@joaotavora
Copy link
Owner

Calling this "minor" because I don't (yet) understand what could be gained here.

@leungbk
Copy link
Contributor Author

leungbk commented Feb 5, 2021

The semantic highlighting makes it a bit easier for users to connect the dots when reading code. This is what we see without semantic highlighting (default Emacs theme + rustic-mode, which should not meaningfully differ from rust-mode here):

2021-02-04-155049_1920x1080_scrot

Notice that the arguments ctx, cap, etc. are light-brownish inside the argument list, but colored black within the function body. When we use lsp-mode's semantic highlighting, they are colored consistently:

2021-02-04-155159_1920x1080_scrot

The emacs-tree-sitter package does something similar.

@joaotavora
Copy link
Owner

Two questions:

  • Is semantic highlighting conceptually different from what can be achieved with traditional Emacs font-lock and font-lock-keywords. It's be great to get rid of that complexity. But it successfully parses a lot of languages and is very well integrated into Emacs.

  • In the lsp-mode example. Is font-lock also active or is all of the code being highlighting with "semantic tokens"?

@leungbk
Copy link
Contributor Author

leungbk commented Feb 5, 2021

Is semantic highlighting conceptually different from what can be achieved with traditional Emacs font-lock and font-lock-keywords. It's be great to get rid of that complexity. But it successfully parses a lot of languages and is very well integrated into Emacs.

Not sure I understand your question. Semantic highlighting should be based on an AST, in contrast to traditional regexp-based font-lock.

In the lsp-mode example. Is font-lock also active or is all of the code being highlighting with "semantic tokens"?

font-lock-mode is on; lsp-mode simply advises font-lock-fontify-region-function, as you can see here.

@joaotavora
Copy link
Owner

joaotavora commented Feb 5, 2021

no sure I understand your question. Semantic highlighting should be based on an AST, in contrast to traditional regexp-based font-lock.

font-lock.el doesn't have to be based on regexps. as you yourself have noted in the next response. But it often is, and reasonably successfully.

font-lock-mode is on; lsp-mode simply advises font-lock-fontify-region-function, as you can see here.

Right. But are the font-lock keywords in rust-mode doing any work? Will you get the same colors if you simply remove those keywords? Or is it doing some work?

@leungbk
Copy link
Contributor Author

leungbk commented Feb 5, 2021

Right. But are the font-lock keywords in rust-mode doing any work? Will you get the same colors if you simply remove those keywords? Or is it doing some work?

When redefining the major-mode to run

  (setq-local font-lock-keywords nil)
  (setq-local font-lock-defaults nil)

then with rustic-mode + the rust-analyzer language server, we still see semantic-token-based highlighting when lsp-mode is enabled and the user has requested that semantic highlighting be enabled.

@joaotavora
Copy link
Owner

then with rustic-mode + the rust-analyzer language server, we still see semantic-token-based highlighting when lsp-mode is enabled and the user has requested that semantic highlighting be enabled.

And does it look exactly like the second example you posted here: #615 (comment) ?

Also, does this work with rust-mode?

@leungbk
Copy link
Contributor Author

leungbk commented Feb 5, 2021

And does it look exactly like the second example you posted here: #615 (comment) ?

Also, does this work with rust-mode?

I redid with rust-mode, and with a different file (font-locking took longer on the other one). The font-lock-keywords are doing a small amount of work, presumably for keywords where either rust-analyzer or lsp-mode is mum.

Here's a file with the full fruit salad, semantic highlighting combined with rust-mode's non-nil font-lock-keywords and font-lock-defaults:

2021-02-04-171711_1920x1080_scrot

And here's the same file with semantic highlighting but with nil font-lock-defaults and font-lock-keywords:

nil

@terlar
Copy link

terlar commented Feb 5, 2021

It nice to hear that semantic highlighting has become a thing in LSP as that brings this feature to all major editors. Previously I have used these two packages to varying successful degree:

However, it has been to various successful degree depending on language, for Emacs Lisp it works great, but for many other languages it usually highlights too many things, personally I would only like to see it for variables to follow them through the code (I am generally using a colorless theme, so it becomes really obvious where the variables are used). These modes are slightly different as it generally assigns one color per variable.

@maan2003
Copy link

maan2003 commented Feb 5, 2021

there is a lot more to semantic highlight than the default lsp-mode highlighting. For example, I have added a face for mutable modifier that rust analyzer sends. there is also highlighting within doc comments
image

@joaotavora
Copy link
Owner

@maan2003 what major mode as you using? And would you generally recommend rust-analyzer over `rls?

@stephe-ada-guru
Copy link
Collaborator

stephe-ada-guru commented Feb 5, 2021 via email

@joaotavora
Copy link
Owner

That can only be done by a parser, which is what LSP is for.

Right. But "parse" is a very broad term. I think you mean LSP is aware of the program's AST more fully than Emacs is (for most major modes, perhaps not all). Regexps are a way of parsing certain bits of the language, and syntax tables (for parenthesis matching, for not other) are another where regexps will struggle. Neither are as good as having the AST but they are still pretty good in many cases.

But excluding font-lock-keywords, it'd be nice to hook the font-lock machinery into the parse results provided by LSP or by some other incremental parser designed for editors (like tree-sitter). Do you have an idea of how that could be achieved properly, i.e. without some dirty hack or advice? If I learn of a suitable interface, hooking it onto LSP doesn't seem like an extraordinary amount of work. But I would also like to experiment hooking it onto "tree sitter" (https://tree-sitter.github.io/tree-sitter/)

@leungbk
Copy link
Contributor Author

leungbk commented Feb 5, 2021

cc @ubolonton

@stephe-ada-guru
Copy link
Collaborator

stephe-ada-guru commented Feb 6, 2021 via email

@maan2003
Copy link

maan2003 commented Feb 9, 2021

what major mode as you using?

Rustic

And would you generally recommend rust-analyzer over rls?

Yes, rust-analyzer is a lot better.

@maan2003
Copy link

maan2003 commented Feb 9, 2021

Also LSP based highlighting is lot advance than tree sitter. I have few cases from the above example

Example 1:

matches!(
    leaf.kind()

highlighting inside macros(matches in this case) requires macro expansion.

Example 2:

let item_keyword = ctx.token_at_offset().find(|leaf| {

also find is underlined because it takes as argument &mut self
requires lot of parts of language

  • type inference
  • libraries and modules lookup
  • trait resolution(finding out the definition of the function)

@nowislewis
Copy link

Can anyone tell me how the semantic highlighting is going now? I just switched from lsp to eglot after 1 year friendship with lsp. I really like the cleanliness and speed of eglot

@joaotavora
Copy link
Owner

Eglot doesn't do "semantic highlighting" with information taken from the LSP server. The current understanding is that it is much slower (though potentially less accurate in some languages) than Emacs's own syntax highlighting, which is called "font lock" in Emacs.

@nowislewis
Copy link

Thanks for your amazing work. I want to know whether semantic highlight is difficult to achieve elegantly. If it does not affect the cleanliness and beauty of eglot, then we would be very grateful if there is an option for the user to switch. But if the implementation is troublesome and destructive, don’t let it affect the normal development of eglot.

@joaotavora
Copy link
Owner

I don't know that it is particularly destructive, neither very laborious. Feek free to give it a go. I believe lsp-mode has an implementation for it (but I haven't looked). YOu can see a summarized description here. Seems that it uses advice, but that can probably be avoided.

@AkibAzmain
Copy link

I'm using Eglot and very happy with it. But I miss semantic tokens, and I have no idea about how to implement it.

@AkibAzmain AkibAzmain linked a pull request Feb 21, 2022 that will close this issue
5 tasks
@AkibAzmain
Copy link

I've implemented it. See #839.

@AkibAzmain
Copy link

@nowislewis Check out #839.

@artempyanykh
Copy link
Contributor

One more datapoint about semantic highlighting. I maintain a Markdown LSP server. In addition to regular markdown links it also support [[wiki-style#links]]. These wiki-style links are not highlighted in any way by editors. Semantic tokens allow me to add the extra highlighting that works uniformly across all LSP clients and in my opinion this improves the overall experience quite a bit.

An example:

With semantic tokens
Screenshot 2022-08-21 at 20 41 31

Without semantic tokens
Screenshot 2022-08-21 at 20 42 26

@slondr
Copy link

slondr commented Sep 4, 2023

@stephe-ada-guru Do I understand correctly that this feature being implemented in eglot would allow face support in ada-mode without wisi? Wisi doesn't compile on my machine, so I'm investigating my options for a functional ada-mode.

@coffeemug
Copy link
Contributor

Another data point-- clangd implements a semantic highlighting extension for inactive code. So for example if you're working on a C file, in VSCode the #ifdef sections for other platforms are grayed out and you can focus on the platform you're developing on. For big files this is a godsend.

@HaraldKi
Copy link

HaraldKi commented Apr 5, 2024

Regex based syntax highlighting when the server knows all the details without fumbling and guessing seems sub-optimal. I started an eglot-semtok experiment for semantic token based highlighting. But I never used Elisp before, so don't wait for this. But friendly comments about elisp botches I make are welcome (except hanging parens, read the README).

@joaotavora
Copy link
Owner

Regex based syntax highlighting when the server knows all the details without fumbling and guessing seems sub-optimal

Most -ts- major modes use in-process incremental tree-sitter parsing. Tree sitter uses language definitions. I'd say this makes communicating buffer changes over stdin/out/network to an external process and then constantly asking for loads of JSON to say where things are located seem sub-optimal, except maybe for small or toy cases.

But I never used Elisp before, so don't wait for this. But friendly comments about elisp botches I make are welcome (except hanging parens, read the README).

For the reasons above I'm not "waiting for this". Maybe others are, so feel free to propose this to Emacs's proper (via bug-gnu-emacs@gnu.org), maybe it's useful in some way. Haven't looked at the implementation. If you ask me, chances are your parens quirk will decrease the chances that I or others will ever want to.

Anyway, there are already implementations for this around this repo. Last I checked the challenge was doing this with a tight efficient connection to the fontification machinery. You'd want a connection as tight as the tree-sitter implementation has, but that engine is inherently faster since it lives in the C. With LSP/Elisp it's very challenging, since you have to deal with asynchronicity (and failure) and care not to over-request for very large buffers. Just like with inlay hints, where I ended up using jit-lock-register which is decent but still not 100% accurate. Good luck!

@appetrosyan
Copy link

appetrosyan commented Aug 7, 2024

Regex based syntax highlighting when the server knows all the details without fumbling and guessing seems sub-optimal.

I would unfortunately have to disagree on this one. While there is no mandate to use incremental parsing for any major mode, the vast majority of the upstream major modes are exceptionally polished and not very... as you say... fumbling. They do the job well, especially for lisp modes. So this may not necessarily help with accuracy.

A bigger problem IMO and that @joaotavora pointed out, is that most communication with a language server happens via JSON. The inefficiency of parsing JSON, creating a delta, and then applying it results in significant performance degradation. Couple that with the fact that you cannot, even in principle, apply this fontification asynchronously.

While I appreciate that semantic tokens can be useful in some situations, applying them to general fontification is a bad idea. Implementing it shouldn't be too difficult, but I don't see using it.

@HaraldKi
Copy link

Is this in an order of magnitude the user cares about:

The inefficiency of parsing JSON, creating a delta, and then applying it results in significant performance degradation.

A great deal of the Internet these days relies on exactly this, sending JSON (or similar) over the net. And that's not just localhost. So yes, no doubt there is a performance degradation, but do we have numbers? Would the user notice?

@joaotavora
Copy link
Owner

So yes, no doubt there is a performance degradation, but do we have numbers?

We have a fair number of anecdotal evidence that large quantities of JSON emitted towards Emacs slow the user experience, especially because Emacs is single threaded. This has been getting better. I cannot produce numbers. But you can implement all this yourself, do the measurements, and come to your own conclusions.

Would the user notice?

No idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.