Search Addon: Fix length calculation of wide unicode chars #3236

gera2ld · 2021-01-29T04:28:26Z

addons/xterm-addon-search/src/SearchAddon.ts

jerch

Thx for looking into this.

Some remarks from my side:

Could you move the changes to a separate branch? That would make testing of the PR alot easier (it is currently on master 😨).
Imho this still does not work correctly, repro:
- input ￥￥￥hello ￥￥￥hello##########café éé ééécafcaféééééécafcaféééééécaf𝄞𝄞𝄞𝄞𝄞𝄞𝄞𝄞𝄞𝄞𝄞ééhello into the terminal
- search for ￥, é and hello
- result: not all occurances found, some selections are totally off

Imho this needs a more fundamental rewrite - one way to approach it would be to get rid of the regexp positioning at all and search only on buffer indices. This would eliminate the string to buffer position errors. It is abit more implementation work but should run fine perfwise as well.

Another way is to stick with initial regexp search and also to translate the beginning string offset into a buffer index. This is needed as only the very first cell of a new line is guaranteed to have the same index position in the string and the buffer (index 0). Note that this also does not apply to wrapped lines (you basically have to start the buffer index translation always from the very first char of a wrapped line, which might span several buffer rows).

gera2ld · 2021-02-22T08:48:22Z

Sorry for the late response.

I don't think it's a good idea to get rid of the regexp positioning because in that case we can hardly search by user provided regexp.
I've added a new method to translate buffer index to string offset so that the searching process only depends on string indices and works well with regexp.

But I noticed that searching in wrapped line does not work if the line contains special chars like emojis, because the .substring here cuts the string at terminal.cols, while such emojis occupies only one column but the string length is larger than 1:

xterm.js/addons/xterm-addon-search/src/SearchAddon.ts

Line 375 in 187f072

    
           lineString += line.translateToString(!lineWrapsToNext && trimRight).substring(0, terminal.cols);

I'm wondering if the .substring can be removed safely.

gera2ld · 2021-02-22T08:53:20Z

* Could you move the changes to a separate branch? That would make testing of the PR alot easier

Sorry for that, but it seems not possible to change branch name now. I can create another PR if you want.

jerch · 2021-02-22T23:22:33Z

@gera2ld Yes, removing regexp would be a big loss in flexibility, its better to find a solution that keeps working with them.

[...] I'm wondering if the .substring can be removed safely.

Hmm looking at _translateBufferLineToStringWithWrap as a whole imho it does not the right thing and needs a rewrite (and yes without substring) for several reasons:

A search term might be split over several wrapped lines.
A wrapped line might end earlier if the next char does not fit anymore (a wide char at term.col - 1 would flow to the next line because of taking 2 cells leaving one cell empty).
Any position deduction of string index <---> buffer index has to be done via the full wcwidth translation.

Esp. the last point is tricky to get done right, as any non 1:1 mapping char will create offsets - combining chars on BMP chars will create +1 offsets on the string index, while BMP wide chars will create -1 offsets (relative to buffer index). The only fixpoint in both metrics we have are real new lines at index 0 (lines that have no following isWrapped flag set).

Idea to revamp _translateBufferLineToStringWithWrap:

Whenever a new line is found, consume all follow-up wrapped lines into one single (unwrapped) string. Thats pretty much what it does atm, but without substring. This will also correctly deal with 1. above.
Special case early wrapping wide char (2. above): If the first char on the next line is a wide char and the current line is only term.cols - 1 in length, treat it as early wrapping - ignore the empty cell, otherwise count empty cell as whitespace. This leads to a faulty handling, if the empty cell was indeed meant to stay empty. We already had an issue report about that, but sadly we have currently no way to track that special edge case.
Whenever some match was found in the full (unwrapped) string, translate all string positions (start and end index) back by walking buffer positions + wcwidth continuing from the last saved match buffer position (minor optimization, if it was in that string) or from the very first char of the whole line. This "buffer replay" will ensure 3. above.

Sorry for that, but it seems not possible to change branch name now. I can create another PR if you want.

Hmm I think changing a PRs source branch is not possible. Guess creating a new one is the only option.

gera2ld · 2021-02-23T04:39:54Z

Update:

the string offsets of the first char in each buffer row as well as the string of the whole line is cached now, so the buffer column can be calculated easily from the beginning of the corresponding row, no matter whether it's early wrapped or for whatever reason the string length does no match the buffer columns.

JasinYip · 2021-10-25T05:02:25Z

Any news for this? 😢

Tyriar · 2021-10-25T10:26:03Z

@jerch can you take another look at this if you have the time? The issue seems to be fixed when testing and looks like the tests pass. Any other comments?

jerch · 2021-10-25T13:06:52Z

@Tyriar Yes, will see, if I can look at it this week. The problem I had was the PR pulling because of its branch name, but I can do the testing on a separate repo clone.

jerch · 2021-11-16T16:47:38Z

Please review someone else, as it seems I wont get down to it.

Tyriar

Thanks @gera2ld and sorry about how long this took to merge in. I made a few minor doc related changes but since the tests pass and changes seem reasonable lgtm.

JasinYip · 2021-12-23T02:49:15Z

This is the best news for me for Christmas and the New year, thank you all you guys fix this!

gera2ld force-pushed the master branch 2 times, most recently from 81cdaca to a57ee6b Compare January 29, 2021 06:05

rheh reviewed Feb 11, 2021

View reviewed changes

addons/xterm-addon-search/src/SearchAddon.ts Outdated Show resolved Hide resolved

addons/xterm-addon-search/src/SearchAddon.ts Outdated Show resolved Hide resolved

jerch requested changes Feb 11, 2021

View reviewed changes

fix length calculation of wide unicode chars

6f6378a

gera2ld force-pushed the master branch from a57ee6b to b222b66 Compare February 22, 2021 08:09

fix: translate buffer col to string offset

ae037bd

gera2ld force-pushed the master branch from b222b66 to ae037bd Compare February 22, 2021 08:23

gera2ld force-pushed the master branch from 3c58209 to e340005 Compare February 23, 2021 04:32

gera2ld force-pushed the master branch from e340005 to 84a8405 Compare February 23, 2021 04:44

fix: translate string offset to buffer index in wrapped line

db42dcf

gera2ld force-pushed the master branch from 84a8405 to db42dcf Compare March 2, 2021 03:31

Tyriar mentioned this pull request Aug 13, 2021

Selection with search and unicode #1686

Closed

Merge branch 'master' into master

6bca6bd

hanskokx mentioned this pull request Sep 21, 2021

Poor compatibility with Emoji felixse/FluentTerminal#963

Open

Tyriar closed this Oct 22, 2021

Tyriar reopened this Oct 22, 2021

Merge branch 'master' into master

a974d33

Merge branch 'master' into master

7abfccc

jerch self-assigned this Oct 25, 2021

jerch removed their assignment Nov 16, 2021

Merge branch 'master' into master

8d2e432

Tyriar changed the title ~~fix length calculation of wide unicode chars~~ Search Addon: Fix length calculation of wide unicode chars Dec 22, 2021

Tyriar added 2 commits December 22, 2021 09:26

Name line cache entry type

0fcd870

Add docs to line cache entry items

f685360

Tyriar approved these changes Dec 22, 2021

View reviewed changes

Tyriar added this to the 4.16.0 milestone Dec 22, 2021

Tyriar merged commit c22b3a9 into xtermjs:master Dec 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search Addon: Fix length calculation of wide unicode chars #3236

Search Addon: Fix length calculation of wide unicode chars #3236

gera2ld commented Jan 29, 2021 •

edited by Tyriar

Loading

jerch left a comment •

edited

Loading

gera2ld commented Feb 22, 2021

gera2ld commented Feb 22, 2021

jerch commented Feb 22, 2021 •

edited

Loading

gera2ld commented Feb 23, 2021

JasinYip commented Oct 25, 2021

Tyriar commented Oct 25, 2021 •

edited

Loading

jerch commented Oct 25, 2021

jerch commented Nov 16, 2021

Tyriar left a comment

JasinYip commented Dec 23, 2021

Search Addon: Fix length calculation of wide unicode chars #3236

Search Addon: Fix length calculation of wide unicode chars #3236

Conversation

gera2ld commented Jan 29, 2021 • edited by Tyriar Loading

jerch left a comment • edited Loading

Choose a reason for hiding this comment

gera2ld commented Feb 22, 2021

gera2ld commented Feb 22, 2021

jerch commented Feb 22, 2021 • edited Loading

gera2ld commented Feb 23, 2021

JasinYip commented Oct 25, 2021

Tyriar commented Oct 25, 2021 • edited Loading

jerch commented Oct 25, 2021

jerch commented Nov 16, 2021

Tyriar left a comment

Choose a reason for hiding this comment

JasinYip commented Dec 23, 2021

gera2ld commented Jan 29, 2021 •

edited by Tyriar

Loading

jerch left a comment •

edited

Loading

jerch commented Feb 22, 2021 •

edited

Loading

Tyriar commented Oct 25, 2021 •

edited

Loading