-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert positions from LSP coordinates to Kakoune coordinates #98
Comments
Good catch! |
Do you have any ideas how to fix it efficiently? Looks like generic solution requires kak-lsp to track and analyze contents of open buffers =( @mawww Do you know anything already implemented on Kakoune side which might help with such conversion? |
Argh, Microsoft, again ? I thought we were friends... More seriously it bothers me they cannot let utf-16 die in the MS world, utf-8 won, everybody uses utf-8 except to access the win32 api... Its even stupider as the text documents themselves are expected to be transferred as utf-8. Frankly I view this as a bug in the lsp spec, and ideally we should lobby them to fix that, but I doubt this will get fixed anytime soon... Kakoune uses 0-based byte coordinates for selections internally, and exposes them as 1-based byte coordinates to the external world (because user side line/columns are traditionally 1-based, as seen in compiler error message for example). I would find it really ugly for kak-lsp to have to store the buffer content itself just for that case, an alternate solution (that I am not really happy with either) would be to have a way to specify utf-16 based coordinates to kakoune (say The best alternative remains to remind the LSP spec writer that there were 3 sane alternatives (utf8 byte coordinates, column coordinates or codepoint coordinates) and for some strange/historical reason they went with another one... Yeah, I am a bit annoyed at you Microsoft 😄 Edit: Here is the discussion on the lsp side: microsoft/language-server-protocol#376 |
(to be fair to Microsoft, I'm guessing this particular API decision comes from VS Code being written in JavaScript, whose spec requires UTF-16 strings, not particularly the Win32/Cocoa/Java APIs) |
As discussed on IRC, kak-lsp wouldn't necessarily need to cache the entire document: if you had a list of the offsets at which astral-plane characters appear, you could take each LSP coordinate and binary-search in the list to see how many astral-plane characters appear before it, and subtract that number from the offset to find the codepoint offset. As for finding astral-plane characters, some quick investigation with Python:
... suggests that any byte whose value >= 0xf0 is the initial byte of an astral-plane character. That should be pretty easy to search for, without having to transcode anything to UTF-16 and count code-points.
Wait, so the line:column indicator in the status-bar (which seems to count codepoints) is unrelated to the line.column syntax used in ranges and selections? That seems... misleading. |
ranges and selections use |
fixed by fb972fc (Use UTF-16 code unit offsets instead of code point offsets, as per LSP, 2022-09-03) |
The LSP specification (version 3.0) says:
Meanwhile, Kakoune uses one-based line and character offsets, and seems to count 1 for every kind of character, including basic ASCII, Basic Multilingual Plane characters, astral plane characters like emoji, and individual combining characters.
Currently kak-lsp converts positions by adding 1 (converting from zero-based to one-based), but does not account for the difference between codepoints and UTF-16 code units.
The text was updated successfully, but these errors were encountered: