-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Counter utf-16 characters #284
Conversation
Benchmark function! s:count_utf16_code_units(str) abort
let l:len = strchars(a:str)
let l:i = 0
let l:cnt = 0
while l:i < l:len
let l:chr = strcharpart(a:str, l:i, 1)
if char2nr(l:chr) > 0x10000
let l:cnt = l:cnt + 2
else
let l:cnt = l:cnt + 1
endif
let l:i = l:i + 1
endwhile
return l:cnt
endfunction
function! s:my_count_utf16_code_units(str) abort
let l:rs = split(a:str, '\zs')
let l:len = len(l:rs)
return l:len + count(l:rs, 'char2nr(v:val)>0x10000')
endfunction
function! s:benchmark()
let s = 'a𐐀b'
let n = 100000
let start = reltime()
for i in range(n)
call strlen(s)
endfor
echo printf('strlen: %f', reltimefloat(reltime(start)))
let start = reltime()
for i in range(n)
call s:count_utf16_code_units(s)
endfor
echo printf('s:count_utf16_code_units: %f', reltimefloat(reltime(start)))
let start = reltime()
for i in range(n)
call s:my_count_utf16_code_units(s)
endfor
echo printf('s:my_count_utf16_code_units: %f', reltimefloat(reltime(start)))
endfunction
call s:benchmark()
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I encountered this I also needed to replace all occurrence of str[start:end]
with strcharpart()
. Was that not necessary here?
It is no needed that replace all of strlen to this function. This function should be used for only calculate offset. |
function! s:count_utf16_code_units(str) abort
let l:len = strchars(a:str)
let l:i = 0
let l:cnt = 0
while l:i < l:len
let l:chr = strcharpart(a:str, l:i, 1)
if char2nr(l:chr) > 0x10000
let l:cnt = l:cnt + 2
else
let l:cnt = l:cnt + 1
endif
let l:i = l:i + 1
endwhile
return l:cnt
endfunction
function! s:my_count_utf16_code_units(str) abort
let l:rs = split(a:str, '\zs')
return len(l:rs) + count(l:rs, 'char2nr(v:val)=>0x10000')
endfunction
function! s:benchmark(name, n, case)
let start = reltime()
for i in range(a:n)
call strlen(a:case)
endfor
echo printf('%s: strlen: %f', a:name, reltimefloat(reltime(start)))
let start = reltime()
for i in range(a:n)
call s:count_utf16_code_units(a:case)
endfor
echo printf('%s: s:count_utf16_code_units: %f', a:name, reltimefloat(reltime(start)))
let start = reltime()
for i in range(a:n)
call s:my_count_utf16_code_units(a:case)
endfor
echo printf('%s: s:my_count_utf16_code_units: %f', a:name, reltimefloat(reltime(start)))
endfunction
call s:benchmark('short string', 100000, 'a𐐀b')
call s:benchmark('long string', 10000, repeat('a𐐀b', 200))
|
There are some interesting approach and comments going on currently at microsoft/language-server-protocol#376 (comment) |
Is vim-lsp currently using UTF-8/bytes, codepoints or grapheme clusters? |
There is now webworkers in ducktape vim. Perf heavy things could go there :) bobpepin/vim#1 |
@mattn Any updates on this? One option I can think of is having this under a feature flag turned off by default. |
Sorry delay, I'll fix soon. |
if g:lsp_use_utf16 | ||
return lsp#utils#strlen(getline(a:m)[:col(a:m)]) | ||
endif | ||
return a:m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I met strange vim's behavior at trying to use this branch.
The behavior is col(a:m)
is not match lsp#utils#strlen(getline(a:m)[:col(a:m)])
in insert-mode.
In my environment, those unmatching were fixable below code.
function! lsp#utils#col(m) abort
if g:lsp_use_utf16
let col = lsp#utils#strlen(getline(a:m)[:col(a:m) - 1])
if mode() ==# 'i' && col(a:m) == col('$')
let col = col + 1
endif
return col
endif
return col(a:m)
endfunction
(This branch is very great work for me. Thanks mattn)
@clason I missunderstood. My environemnt had other patch. It solves utf8 character problems maybe (This patch is |
If someone already found the problem of this patch, please point it. Or if you already have another patch, please send new PR. |
@hrsh7th Could you share your "other patch"? If it works for UTF-8, that'd already be progress. |
Fixed by #447 |
Fixes #282