Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What unicode unit does this project use in lsp ranges? #542

Closed
Avi-D-coder opened this issue Mar 27, 2019 · 3 comments
Closed

What unicode unit does this project use in lsp ranges? #542

Avi-D-coder opened this issue Mar 27, 2019 · 3 comments

Comments

@Avi-D-coder
Copy link

Does this project use UTF-8, UTF-16, codepoint, or grapheme cluster indexes for lsp ranges?

If this project uses a unit other than UTF-16 could/would you ever conform to the protocol 3.0 and use UTF-16?

I am conducting a survey to inform the debate over what unit ranges should use in the Language Server Protocol.
The debate is occuring in issue #376.

@rwols
Copy link
Member

rwols commented Mar 28, 2019

The conversion between an LSP point and a SublimeText point is the identity function: https://github.com/tomv564/LSP/blob/master/plugin/core/protocol.py#L232-L248

Since the ST3 API works in terms of codepoints, we assume the language server talks in codepoints.

No issues have yet been raised about this, but that is probably because most code is written with codepoints living in the ascii range. Which, in that case, makes the UTF-16 and codepoint conversion the identity function, too. Obviously, this plugin strongly prefers codepoints to make life easier.

By the way, does it even matter what the encoding is for the line number part of an LSP point? No matter the encoding, there would be the same amount of newlines. So I'm assuming all it boils down to is how many steps we must make into the columns.

@Avi-D-coder
Copy link
Author

@rwols Thanks.

By the way, does it even matter what the encoding is for the line number part of an LSP point? No matter the encoding, there would be the same amount of newlines. So I'm assuming all it boils down to is how many steps we must make into the columns.

I think you're right.

@randy3k
Copy link
Contributor

randy3k commented Nov 8, 2019

No issues have yet been raised about this, but that is probably because most code is written with codepoints living in the ascii range.

Most of the commonly used CJK Characters are in BMP (basic multilingual plane) with UTF-16 code point of 1, so they usually don't cause any issues. The issue is mostly seen when the most used non-BMP code points, Emojis, are included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants