Proposal of the semantic highlighting protocol extension #367

kittaakos · 2018-06-27T09:33:42Z

Task: #368.

Signed-off-by: Akos Kitta kittaakos@typefox.io

msftclas · 2018-06-27T09:33:53Z

All CLA requirements met.

svenefftinge · 2018-06-27T09:41:34Z

protocol/src/protocol.semanticHighlighting.proposed.md

+	/**
+	 * An array of semantic highlighting scopes for the current token.
+	 */
+	scopes: string[];


Documentation should point out that this corresponds to TextMate scopes. https://manual.macromates.com/en/language_grammars

svenefftinge · 2018-06-27T09:42:27Z

protocol/src/protocol.semanticHighlighting.proposed.ts

@@ -0,0 +1,90 @@
+/* --------------------------------------------------------------------------------------------
+ * Copyright (c) Microsoft Corporation. All rights reserved.


I don't think it's MS copyright

svenefftinge · 2018-06-27T09:43:13Z

protocol/src/protocol.semanticHighlighting.proposed.md

+##### SemanticHighlighting Notification
+
+The `textDocument/semanticHighlighting` notification is pushed from the server to the client to inform the client about additional semantic highlighting information that has to be applied on the text document.
+


It should explain that incremental updates are done on a line-by-line basis.

rcjsuen

I did a quick pass over the proposal but as it's a WIP I haven't really given a good look at it.

rcjsuen · 2018-06-29T12:51:41Z

client/src/semanticHighlighting.proposed.ts

+/* --------------------------------------------------------------------------------------------
+ * Copyright (c) Microsoft Corporation. All rights reserved.
+ * Licensed under the MIT License. See License.txt in the project root for license information.
+ * ------------------------------------------------------------------------------------------ */


Should this be TypeFox instead?

Thanks for the remark, @rcjsuen. I already had the same feedback from @svenefftinge. See here: #367 (comment)

I do not know why this PR is not picking up my change:
https://github.com/kittaakos/vscode-languageserver-node/blob/e773d7c2569414a8ffe9dccfed8f79e068713a82/protocol/src/protocol.semanticHighlighting.proposed.ts#L1-L4

I have compared another module. Sorry for the noise.

rcjsuen · 2018-06-29T13:11:38Z

protocol/src/protocol.semanticHighlighting.proposed.md

+
+_Client Capabilities_:
+
+Capability that has to be set by the language client if that can accept and process the semantic highlighting information received from the server.


if that can should be if it can.

rcjsuen · 2018-06-29T13:11:40Z

protocol/src/protocol.semanticHighlighting.proposed.md

+/**
+ * The text document client capabilities.
+ */
+workspace: {


I think you meant to name this variable textDocument?

rcjsuen · 2018-06-29T13:12:36Z

protocol/src/protocol.semanticHighlighting.proposed.md

+workspace: {
+
+	/**
+	 * `true` if the client has the semantic highlighting support for the text document. Otherwise, `false`. It is `false` by default.


true if the client has the semantic highlighting support for the text document.
...to...
true if the client supports semantic highlighting support text documents.

jacobdufault · 2018-06-29T16:57:01Z

cquery implements semantic highlighting as a notification the client sends to the server and supports rainbow highlighting (ie, each variable is a different shade of the same color, which makes flow analysis easier). Here is what the protocol looks like:

struct Symbol {
  // a stable ID that can be used for rainbow highlighting; ie, this symbol will have
  // the same stableId across multiple notifications (w.r.t. each file)
  int stableId = 0;
  // what of type of symbol declares this symbol? ie, is this a free-standing variable
  // or an instance variable of a class
  SymbolKind parentKind; 
  // what type of symbol is this, ie, variable, func, etc
  SymbolKind kind;
  // what's the memory storage for this variable? ie, is it a global variable?
  StorageClass storage;
  // locations in the document where the symbol is used
  Range[] ranges;
};

// Standard notification
struct CqueryPublishSemanticHighlightingNotification {
  struct Params {
    // The file this applies to
    DocumentUri uri;
    // A list of symbols in the file that should be highlighted
    Symbol[] symbols;
  };
  Params params;
};

It'd be great if the official spec supported all of the features that cquery's implementation supports. Let me know if you have any questions regarding how cquery's implementation works.

There's also an implementation for vscode at https://github.com/cquery-project/vscode-cquery/blob/937864a82e9fb44f03eafb095134d01776e73367/src/extension.ts#L736-L875.

svenefftinge · 2018-06-30T08:51:50Z

cquery implements semantic highlighting as a notification the client sends to the server

You meant the inverse, i.e. notifications from the server to the client, right?

I think the main difference between this proposal and the protocol you have in cquery is, that we rely on textmate scopes rather than passing semantic information to the client. I don't see why your use cases shouldn't be covered. The server just needs to send the right scopes and the client needs to support them.

Also, this proposal doesn't support multi-line tokens as the server publishes symbols on a per line basis. This is so it can update incrementally.

jacobdufault · 2018-06-30T17:42:58Z

You meant the inverse, i.e. notifications from the server to the client, right?

Oops, yep. cquery binary sends a notification to the client, ie, vscode.

I think the main difference between this proposal and the protocol you have in cquery is, that we rely on textmate scopes rather than passing semantic information to the client. I don't see why your use cases shouldn't be covered. The server just needs to send the right scopes and the client needs to support them.

Does the current proposal support rainbow style semantic highlighting, where each unique variable has a similar but slightly different shade of the same color? I'm not familiar enough with textmate scopes.

As for multi-line tokens, this seems like a rare use-case and is not one that cquery needs.

svenefftinge · 2018-07-01T07:45:59Z

Does the current proposal support rainbow style semantic highlighting, where each unique variable has a similar but slightly different shade of the same color? I'm not familiar enough with textmate scopes.

See https://www.sublimetext.com/docs/3/scope_naming.html for more info on textmate scopes.

It does not directly support rainbow style, but I think it could be done by sending corresponding scopes. For instance, the LS could send variable.other.c1 to variable.other.c9. If the color theme supports it you will get rainbow colors, if not the more general variable coloring would apply.

Levertion · 2018-07-01T17:42:40Z

Why is this in Microsoft/vscode-languageserver-node, rather than Microsoft/language-server-protocol? I presume that it's just to allow the development of the reference implementation and the proposal to be linked?

svenefftinge · 2018-07-01T19:11:26Z

Why is this in Microsoft/vscode-languageserver-node, rather than Microsoft/language-server-protocol?

The process is described here https://github.com/Microsoft/language-server-protocol/blob/master/contributing.md
So this is the mandatory reference implementation for the VSCode language client library.
Unfortunately, VSCode doesn't expose enough API to implement it. E.g. there is no way to access the TextMate scheme registry.
@dbaeumer what would be the way to go in such cases?

Levertion · 2018-07-01T20:05:47Z

Fair enough. I just didn't know why microsoft/language-server-protocol#124 was closed, but that explains it.

rcjsuen · 2018-07-02T03:26:31Z

It's unfortunate that things have to be done this way as there are more people watching Microsoft/language-server-protocol than this repository and would allow for more feedback from the community...but I realize that that discussion is rather off-topic here.

svenefftinge · 2018-07-02T13:03:11Z

We should open a corresponding PR on language-server-protocol, with the relevant changes for the markdown file. @kittaakos WDYT?

dbaeumer · 2018-07-04T08:58:58Z

@svenefftinge @rcjsuen actually the document here https://github.com/Microsoft/language-server-protocol/blob/master/contributing.md says that

Proposed extensions need to be announced via an issue in this repository.

where this repository is https://github.com/Microsoft/language-server-protocol

Agree that this can be made more clear by moving it to the top.

dbaeumer · 2018-07-04T09:03:32Z

Moved to the top. I am open to accepting a reference implementation for a different well used client as well.

kittaakos · 2018-07-04T09:04:46Z

Thanks for the clarification, @dbaeumer. I open one for the proposal in the https://github.com/Microsoft/language-server-protocol repository too.

Agree that this can be made more clear by moving it to the top.

Just naming the repository instead of having a link in the README would help too.

svenefftinge · 2018-07-05T09:31:10Z

I am open to accepting a reference implementation for a different well used client as well.

Ok, good to know. It would be very cool, to have support for it in VSCode, though.
Is there a chance you are exposing a bit more API, so it could work?
Also, what do you think about the incremental updated based on lines?

alexdima · 2018-07-06T16:36:19Z

I think it is a good idea to push the semantic highlights from the server. The only thing to be aware here would be the case that there are 100 opened files in the client and that only 1 file is visible, so the semantic highlights would only be useful for a single file. @dbaeumer I don't know if the protocol has visibility APIs yet, such that the server could limit its pushing of semantic highlights to the currently visible files or the currently visible regions of files?

Having worked before on the problem of how to represent in a memory-friendly way tokens (see https://code.visualstudio.com/blogs/2017/02/08/syntax-highlighting-optimizations), and having dealt with a lot of problems stemming from large files, may I suggest that the protocol in this area be designed with the following goals in mind:

tokens should be encoded in a memory friendly way straight from the wire (JSON).
there should be a way for the server to incrementally update the semantic highlights.

I think goal 1 is somewhat straight-forward to achieve. I suggest that each token does not point to an array of scopes (i.e. SemanticHighlightingToken.scopes), but rather the entire set of possible scopes are registered statically up front and this field should be replaced with an index into the legend. My assumption is that the possible number and types of semantic tokens is finite. I also believe using scopes (which can be targeted by TM themes) it is a good abstraction level. i.e. we should not talk about colors at this layer, as that is something for themes to decide.

So, assuming the entire possible set of scopes is registered/sent over up front, the token interface can be reduced to:

interface SemanticHighlightingInformation {
  line: number;
  tokens: SemanticHighlightingToken[];
}
interface SemanticHighlightingToken {
  character: number;
  length: number;
  scopes: number;
}

The next immediate memory/size optimisation here is to drop the SemanticHighlightingToken object entirely and inline everything in SemanticHighlightingInformation. i.e.

interface SemanticHighlightingInformation {
  line: number;
  tokens: number[]; // 3*i is character, 3*i+1 is length, 3*i+2 is scopes
}

Other optimisations could be pursued, such as noticing that the length is typically a small integer, and if we assume that the possible number of scopes is smaller than 2^16, we can get away with two 32 bit unsigned integers per token:

interface SemanticHighlightingInformation {
  line: number;
  tokens: number[]; // 2*i is character, (2*i+1)&0xffff0000>>16 is length, (2*i+1)&0x0000ffff is scopes
}

In terms of TypeScript, this would be best represented using typed arrays, UInt32Array:

interface SemanticHighlightingInformation {
  line: number;
  tokens: UInt32Array[];
}

The source for an Uint32Array is always an ArrayBuffer, so we could just store that:

interface SemanticHighlightingInformation {
  line: number;
  tokens: ArrayBuffer;
}

And, finally, the best and easiest way I'm aware of for efficiently encoding a byte array in JSON is to use base64 encoding, so at the wire level:

interface SemanticHighlightingInformation {
  line: number;
  tokens: string; // base64 encoded ArrayBuffer that should be viewed as an UInt32Array
}

Now, the second point is a bit more difficult as we would need to clearly specify what is expected of the client when edits occur. For example, we need to specify how the tokens should be backed by "live-markers" such that they stick with the text. i.e. the server and the client need some shared understanding on how the semantic tokens are adjusted. Second, we need for the notification event to be able to replace a certain region in the file with new tokens.

For example, when opening a large file in an editor, let's suppose the server sends a notification (which must include the document version id) which defines all the semantic tokens in the entire file. Now, when an edit occurs, for example, a new line is pressed on the first position in a file, both client and server should share the understanding that all the previously set tokens have been moved by one line (they stick with the text they're sitting on). So, when pressing a new line on the first position in a file, the server needs not to send any notification to update the semantic tokens.

Another example would be typing a character. This gets sent to the server as a text delta which mentions the position and the new inserted character. Similarly, the server should be able to send a notification of a size in direct proportion with the amount of semantic tokens changes. If the user typed a character in an identifier, then a notification updating only that specific semantic token should suffice.

I'm sorry I'm just describing the general mechanism here and not providing a sketch like above, but again, this is a much more complicated goal to achieve. Anyways, the requirements on the client can be reduced if there is some sort of negotiation where the client indicates that it cannot track semantic tokens, in which case the server would always send the tokens for the entire file. (You can think of this as something similar to formatting; there are some formatting providers which are very good and send N edits that target precisely the whitespace that needs to be changed... and then there are formatting providers that send a single huge edit encompassing the entire file and replacing it with new text). In other words, this should be done in a way where a smart client and a smart server can manage to achieve the correct maintenance of a high volume of semantic tokens as efficiently as possible.

isc-bsaviano · 2020-08-06T13:45:00Z

Are there any updates for when this pull request will make it into the master branch? Semantic tokens are a key part of our IDE strategy and we're very excited to add them to our language server!

rcjsuen · 2020-08-06T19:01:18Z

Are there any updates for when this pull request will make it into the master branch? Semantic tokens are a key part of our IDE strategy and we're very excited to add them to our language server!

@bsaviano-intersystems This proposal has been superseded by a new proposal. I suggest you look at the draft 3.16 specification and share your thoughts in microsoft/language-server-protocol#18.

isc-bsaviano · 2020-08-06T19:22:47Z

@rcjsuen Thanks for the response. I've already seen the proposed spec in 3.16. I was wondering if there's a version of the vscode-languageserver/languageclient packages that contains the proposal so I can repurpose my existing DocumentSemanticTokenProvider and test it out. Thanks again.

rcjsuen · 2020-08-06T20:28:13Z

@rcjsuen I was wondering if there's a version of the vscode-languageserver/languageclient packages that contains the proposal so I can repurpose my existing DocumentSemanticTokenProvider and test it out. Thanks again.

@bsaviano-intersystems I know 6.1.1 and up for vscode-languageserver and 6.1.3 and up for vscode-languageclient should be good. I don't know off hand when it was first introduced though.

dbaeumer · 2020-08-07T07:16:59Z

@rcjsuen thanks for answering. The latest server version is 7.0.0-next.6 and client 7.0.0-next.8

isc-bsaviano · 2020-08-07T12:33:38Z

Thank you both for the responses. I will install the “next” packages and try it out. Looking forward to the official 3.16 release! Has a timeframe for that been established?

kjeremy · 2020-08-07T16:07:32Z

@dbaeumer npm outdated shows me that 7.0.0-next.1 is the latest vscode-languageclient

dbaeumer · 2020-08-17T10:30:32Z

@kjeremy that is incorrect. Let me check what happened.

dbaeumer · 2020-08-17T10:35:16Z

That is strange since npm lists everything correctly

What version do you have installed right now. Not sure if outdated works correctly with next versions.

kjeremy · 2020-08-17T11:22:13Z

@dbaeumer

npm --version
6.14.7

npm outdated

dbaeumer · 2020-08-18T12:56:42Z

@kjeremy I mean which version of the vscode-languageclient

kjeremy · 2020-08-18T13:07:54Z

@dbaeumer 7.0.0-next.1

dbaeumer · 2020-08-18T15:01:46Z

@kjeremy what do you expect to happen. To my knowledge next versions are not considered to be semver. So npm will never install 7.0.0-next.2 if you prereq 7.0.0-next.1

kjeremy · 2020-08-18T15:04:25Z

@dbaeumer I expect that 7.0.0-next.8 shows up as the latest version available and it does not.

dbaeumer · 2020-08-19T09:01:14Z

@kjeremy but this is then something that npm needs to deal with. IMO the releases are tag correct. Have you raised the issue with them?

The semantichighlighting proposal from microsoft/vscode-languageserver-node#367 was replaced with semantic tokens in the final protocol version 3.16. This reverts 307f1d8 Task-number: QTCREATORBUG-26624 Change-Id: I635f0b4763a197edabf9edf8d9041143dcf531e3 Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>

Hebe970217 · 2022-01-04T03:25:42Z

excuse me？I see KamasamaK Removing Semantic Highlighting, no support on Oct 20, 2020 at after？

SPGoding · 2022-01-04T03:37:07Z

That was the title of an issue on the eclipse/lsp4j repo to deprecate their Semantic Highlighting API in favor of LSP's Semantic Tokens API, unrelated to microsoft/vscode-languageserver-node.

Hebe970217 · 2022-01-04T03:47:37Z

That was the title of an issue on the eclipse/lsp4j repo to deprecate their Semantic Highlighting API in favor of LSP's Semantic Tokens API, unrelated to microsoft/vscode-languageserver-node.

Okay.
lsp server init result is "documentHighlightProvider": true.
I tried to use textdocument / documenthighlight to request the backend Clangd, and I didn't see the independent return result. I'm writing a CPP client

Hebe970217 · 2022-01-04T03:48:49Z

Maybe I should create a new issues

kittaakos mentioned this pull request Jun 27, 2018

Semantic highlighting protocol extension #368

Closed

svenefftinge reviewed Jun 27, 2018

View reviewed changes

This was referenced Jun 27, 2018

Proposal for (Semantic) Coloring (see #18) microsoft/language-server-protocol#124

Closed

Make semantic highlighting available via LSP numirias/semshi#6

Open

Upstreaming LSP protocol extensions? jacobdufault/cquery#431

Open

kittaakos force-pushed the semantic-highlighting-proposal branch 5 times, most recently from c893bd7 to 04e0033 Compare June 27, 2018 12:51

rcjsuen reviewed Jun 29, 2018

View reviewed changes

Levertion mentioned this pull request Jul 1, 2018

Highlight Levertion/mcfunction-langserver#34

Closed

kittaakos force-pushed the semantic-highlighting-proposal branch 2 times, most recently from e773d7c to e881846 Compare July 2, 2018 09:23

kittaakos mentioned this pull request Jul 4, 2018

Semantic highlighting protocol extension microsoft/language-server-protocol#513

Closed

Eugleo mentioned this pull request Apr 29, 2020

Semantic coloring jeapostrophe/racket-langserver#3

Closed

puremourning mentioned this pull request May 12, 2020

Support semantic highlighting microsoft/language-server-protocol#18

Closed

radeksimko mentioned this pull request Aug 26, 2020

Support semantic syntax highlighting hashicorp/terraform-ls#264

Closed

ericdallo mentioned this pull request Aug 30, 2020

Add support for semanticTokens eclipse-lsp4j/lsp4j#446

Merged

kittaakos closed this Sep 11, 2020

kittaakos mentioned this pull request Oct 5, 2020

Cannot update to SADL 3.4.0 SemanticApplicationDesignLanguage/sadl#506

Closed

KamasamaK mentioned this pull request Oct 20, 2020

Removing Semantic Highlighting eclipse-lsp4j/lsp4j#457

Closed

smemsh mentioned this pull request May 9, 2021

Semantic highlighting microsoft/python-language-server#1903

Open

acao mentioned this pull request Oct 30, 2022

[lsp-server] handle syntax highlighting at the LSP level graphql/graphiql#2851

Open

jiribenes mentioned this pull request Jul 23, 2024

Improve syntax highlighting effekt-lang/effekt-vscode#18

Merged

		@@ -0,0 +1,90 @@
		/* --------------------------------------------------------------------------------------------
		* Copyright (c) Microsoft Corporation. All rights reserved.

		##### SemanticHighlighting Notification

		The `textDocument/semanticHighlighting` notification is pushed from the server to the client to inform the client about additional semantic highlighting information that has to be applied on the text document.


		_Client Capabilities_:

		Capability that has to be set by the language client if that can accept and process the semantic highlighting information received from the server.

Proposal of the semantic highlighting protocol extension #367

Proposal of the semantic highlighting protocol extension #367

Conversation

kittaakos commented Jun 27, 2018 • edited Loading

msftclas commented Jun 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcjsuen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacobdufault commented Jun 29, 2018 • edited Loading

svenefftinge commented Jun 30, 2018

jacobdufault commented Jun 30, 2018

svenefftinge commented Jul 1, 2018

Levertion commented Jul 1, 2018

svenefftinge commented Jul 1, 2018 • edited Loading

Levertion commented Jul 1, 2018

rcjsuen commented Jul 2, 2018

svenefftinge commented Jul 2, 2018

dbaeumer commented Jul 4, 2018

dbaeumer commented Jul 4, 2018

kittaakos commented Jul 4, 2018

svenefftinge commented Jul 5, 2018

alexdima commented Jul 6, 2018

isc-bsaviano commented Aug 6, 2020

rcjsuen commented Aug 6, 2020

isc-bsaviano commented Aug 6, 2020

rcjsuen commented Aug 6, 2020

dbaeumer commented Aug 7, 2020

isc-bsaviano commented Aug 7, 2020

kjeremy commented Aug 7, 2020

dbaeumer commented Aug 17, 2020

dbaeumer commented Aug 17, 2020

kjeremy commented Aug 17, 2020 • edited Loading

dbaeumer commented Aug 18, 2020

kjeremy commented Aug 18, 2020

dbaeumer commented Aug 18, 2020

kjeremy commented Aug 18, 2020

dbaeumer commented Aug 19, 2020

Hebe970217 commented Jan 4, 2022

SPGoding commented Jan 4, 2022

Hebe970217 commented Jan 4, 2022

Hebe970217 commented Jan 4, 2022

kittaakos commented Jun 27, 2018 •

edited

Loading

msftclas commented Jun 27, 2018 •

edited

Loading

jacobdufault commented Jun 29, 2018 •

edited

Loading

svenefftinge commented Jul 1, 2018 •

edited

Loading

kjeremy commented Aug 17, 2020 •

edited

Loading