Selection with search and unicode #1686

jerch · 2018-09-13T12:59:28Z

Combining, surrogate or fullwidth chars in the line and/or the search string lead to weird selection offset problems. Steps to repro:

insert into demo: echo -en 'combining: ééé\nfullwidth: ￥￥￥\nsurrogate: 𓂀𓂀𓂀\n'
search for 'ééé', '￥￥￥' and '𓂀𓂀𓂀'

The selection is kinda off for all 3 types, it gets even worse if the line contains any of these before their occurence. It seems the renderer and the selection manager do not agree on the chars widths and lengths.

Since I had a similar problem with the linkifier, it might be fixable the same way (#1678).

The text was updated successfully, but these errors were encountered:

Tyriar · 2018-09-17T17:22:55Z

Seems to work fine for me on mac/master, let me know if you still see it.

jerch · 2018-09-24T12:03:43Z

@Tyriar Nope its not gone, still the same here. Maybe its a platform issue?

Looks like this atm:

Looks like the accent char is accounted for 2 halfwidth chars by the selector, while the ￥ symbol gets treated as one halfwidth.

Found this in the code:

xterm.js/src/addons/search/SearchHelper.ts

Line 210 in 9e446a9

    
           this._terminal._core.selectionManager.setSelection(result.col, result.row, result.term.length);

Imho the last argument should be the sum of wcwidth instead of the string length (not tested yet).

Tyriar · 2018-09-24T14:04:16Z

@jerch are you on Linux?

jerch · 2018-09-24T14:05:25Z

Yes Ubuntu 16 here.

Tyriar · 2018-09-24T14:54:46Z

I guess we need to have a setting for this stuff like you were suggesting before. Still not sure the best way of querying the platform for these character widths though, I doubt we can rely on all Linux distros being the same and macOS being a different case.

Tyriar · 2018-09-24T14:55:26Z

Same underlying issue to #1059?

jerch · 2018-09-24T14:59:28Z

Nope, this time its not wcwidth's fault, changing the argument I mentioned above fixes the problems (tested a few minutes ago)

jerch · 2018-09-26T20:53:23Z

Currently blocked by #1707 and #1709.

jerch · 2018-12-16T17:00:31Z

Some background on this:
The way the start and end pos of the selection is determined still does not work for all surrogates and fullwidth chars combinations - thus if there are any of those in the line before the match or in the match itself start and end offsets can occur.

This can be fixed the same way I had to fix the linkifier underlining in #1769, by mapping a string index back to the buffer index:

xterm.js/src/Linkifier.ts

Line 223 in c7fa89d

    
           const bufferIndex = this._terminal.buffer.stringIndexToBufferIndex(rowIndex, stringIndex, true);

If done twice (match start and end) the selection will correctly point to the underlying cells.

Tyriar · 2019-10-07T17:51:59Z

This still happens. This is the problem line:

xterm.js/addons/xterm-addon-search/src/SearchAddon.ts

Line 345 in cad9477

terminal.select(result.col, result.row, result.term.length);

result.term.length for ééé is 6, the fix likely involves returning from _fineInLine an end row and col instead of the actual term.

Silvyre · 2019-10-28T16:02:21Z

Hello, I would like to join my peer @miggs125 in contributing to xterm by tackling this issue.

I will first attempt to improve selection of strings that include diacritical marks.

jerch · 2019-10-29T07:12:41Z

@Silvyre Sure thing. Note that the terminal buffer already accounts diacritical characters into one cell with the main character, thus the issue comes from the string position to cell back-mapping.

Silvyre · 2019-10-30T04:15:30Z

@jerch Thanks!

Note that the terminal buffer already accounts diacritical characters into one cell with the main character

Are you referring to the JoinedCellData type? As far as I can tell, this base type is not currently used within search selection (search selection appears to handle buffer cells as objects of IBufferCell type, which is not part of the ICellData hierarchy).

Silvyre · 2019-10-30T04:54:04Z

changing the argument I mentioned above fixes the problems (tested a few minutes ago)

Starting to get on the same page. OK, modifying _findInLine to return getStringCellWidth(term) instead of term appears to improve the selection of diacritical characters, e.g. ééé (at least on Ubuntu 18.04; getStringCellWidth() calls wcwidth(), which may perform differently on other platforms?).

I can't imagine this to be a satisfactory solution, considering that, as you mentioned, this does not work for all surrogate/fullwidth character combinations [across various platforms] (e.g. selection of ￥ is still not great, at least on Ubuntu 18.04).

Silvyre · 2019-10-30T05:49:44Z

selection of ￥ is still not great, at least on Ubuntu 18.04

To clarify, it sometimes works, as shown in this GIF, which I created after replacing every instance of line/term/cell.length with getStringCellWidth(...) in SearchAddon.ts. I'm going to try to tweak the find functions a bit more and see if I can improve behaviour that way.

jerch · 2019-10-30T09:26:43Z

@Silvyre Yes working with wcwidth correction is the right way to go here. Imho needed once for the search term itself (in case it contains weird chars) to get the amount of cells taken ("cell length"), then you'd need to correct every start offset found likewise to find the real cell offset. That cell-offset + term-cell-length % cols should give the real start and end position in the buffer.

Silvyre · 2019-10-30T13:03:18Z

@jerch Excellent, I'll work on that. Thanks again!

Silvyre · 2019-10-30T16:50:08Z

I have a general question regarding addons and dependencies: how are helper functions in src/common (e.g. getStringCellWidth, wcwidth from CharWidth.ts) imported into addons (e.g. addons/xterm-addon-search)?

jerch · 2019-10-30T16:56:08Z

@Silvyre They arent yet, the public API gets extended on request. Thus you'd have to go with internal refs for now. Maybe open an issue regarding this so we can decide how and where to put it.

Silvyre · 2019-10-30T17:01:29Z

Sure thing, I'll open an issue.

Silvyre · 2019-10-31T09:49:22Z

you'd need to correct every start offset found likewise to find the real cell offset

@jerch I'm having a bit of a difficult time determining how and where cell offsetting should be (or is) implemented. Within BufferLine.ts?

Silvyre · 2019-10-31T10:03:24Z

I've also noticed that selectionEnd appears to spend most of its time undefined, while finalSelectionEnd gets defined. A related bug, maybe?

Tyriar · 2019-10-31T15:49:35Z

Maybe, it's meant to be undefined for various types of selection if I remember right though (word, line, select all).

jerch · 2019-10-31T16:02:37Z

@jerch I'm having a bit of a difficult time determining how and where cell offsetting should be (or is) implemented. Within BufferLine.ts?

Ah yepp thats abit hidden in the codebase, the code regarding this is in Buffer.ts and BufferLine.ts, both contain several methods that demostrate how to walk cells, easiest startpoint might be this:

xterm.js/src/common/buffer/Buffer.ts

Line 480 in e8153d9

    
           public stringIndexToBufferIndex(lineIndex: number, stringIndex: number, trimRight: boolean = false): BufferIndex {

Not sure if you can directly use this method, you have to take care where your string index origin is (whether col 0 of wrapped or unwrapped lines).

JasinYip · 2020-10-15T03:30:52Z

I'm using VSCode(1.50.0) on macOS Catalina(10.15.6) and this issue still happening.

JasinYip · 2021-01-18T13:29:26Z

@Tyriar Hi, so the issue has any solutions? I tried to load xterm-addon-unicode11, it can only fix emoji chars viewing but searching for Chinese chars still having the issue.

Tyriar · 2021-01-19T11:29:32Z

Been a while since I looked at this code but I think we could expose the active IUnicodeVersionProvider's wcwidth to extensions via IUnicodeHandling.activeProvider.wcwidth or similar to solve this.

JasinYip · 2021-08-13T08:01:43Z

@Tyriar How could we fix or adapt this in our production? Seems it was internal code you write above and I have no idea how should I do...

Tyriar · 2021-08-13T16:36:58Z

@JasinYip there's some discussion about the fix in #3236, been a while since I looked and don't have time atm though.

jerch added the type/bug Something is misbehaving label Sep 13, 2018

jerch mentioned this issue Sep 13, 2018

Improve unicode string handling in linkifier #1678

Merged

2 tasks

Tyriar added the area/selection label Sep 13, 2018

Tyriar closed this as completed Sep 17, 2018

jerch reopened this Sep 24, 2018

jerch mentioned this issue Nov 29, 2018

search addon doesn't work well with chinese #1801

Closed

ntchjb mentioned this issue Jan 1, 2019

Fix search addons: Fix problem of skipping some results and incorrect text selection after resize window #1866

Merged

Tyriar added good first issue help wanted labels Oct 7, 2019

Tyriar changed the title ~~selection with search and unicode~~ Selection with search and unicode Oct 7, 2019

Silvyre added a commit to Silvyre/xterm.js that referenced this issue Oct 31, 2019

Progress toward xtermjs#1686: some desirable selection behaviour

1730760

Silvyre mentioned this issue Oct 31, 2019

Progress toward #1686: some desirable selection behaviour #2526

Closed

This was referenced Jan 29, 2021

SearchAddon shows wrong selection when bounding with wide unicode chars #3235

Closed

Search Addon: Fix length calculation of wide unicode chars #3236

Merged

Tyriar added this to the 4.16.0 milestone Dec 22, 2021

Tyriar closed this as completed in #3236 Dec 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selection with search and unicode #1686

Selection with search and unicode #1686

jerch commented Sep 13, 2018

Tyriar commented Sep 17, 2018

jerch commented Sep 24, 2018 •

edited

Loading

Tyriar commented Sep 24, 2018

jerch commented Sep 24, 2018

Tyriar commented Sep 24, 2018

Tyriar commented Sep 24, 2018

jerch commented Sep 24, 2018

jerch commented Sep 26, 2018

jerch commented Dec 16, 2018

Tyriar commented Oct 7, 2019

Silvyre commented Oct 28, 2019

jerch commented Oct 29, 2019 •

edited

Loading

Silvyre commented Oct 30, 2019 •

edited

Loading

Silvyre commented Oct 30, 2019 •

edited

Loading

Silvyre commented Oct 30, 2019 •

edited

Loading

jerch commented Oct 30, 2019

Silvyre commented Oct 30, 2019 •

edited

Loading

Silvyre commented Oct 30, 2019 •

edited

Loading

jerch commented Oct 30, 2019 •

edited

Loading

Silvyre commented Oct 30, 2019

Silvyre commented Oct 31, 2019 •

edited

Loading

Silvyre commented Oct 31, 2019

Tyriar commented Oct 31, 2019

jerch commented Oct 31, 2019

JasinYip commented Oct 15, 2020

JasinYip commented Jan 18, 2021

Tyriar commented Jan 19, 2021

JasinYip commented Aug 13, 2021

Tyriar commented Aug 13, 2021 •

edited

Loading

Selection with search and unicode #1686

Selection with search and unicode #1686

Comments

jerch commented Sep 13, 2018

Tyriar commented Sep 17, 2018

jerch commented Sep 24, 2018 • edited Loading

Tyriar commented Sep 24, 2018

jerch commented Sep 24, 2018

Tyriar commented Sep 24, 2018

Tyriar commented Sep 24, 2018

jerch commented Sep 24, 2018

jerch commented Sep 26, 2018

jerch commented Dec 16, 2018

Tyriar commented Oct 7, 2019

Silvyre commented Oct 28, 2019

jerch commented Oct 29, 2019 • edited Loading

Silvyre commented Oct 30, 2019 • edited Loading

Silvyre commented Oct 30, 2019 • edited Loading

Silvyre commented Oct 30, 2019 • edited Loading

jerch commented Oct 30, 2019

Silvyre commented Oct 30, 2019 • edited Loading

Silvyre commented Oct 30, 2019 • edited Loading

jerch commented Oct 30, 2019 • edited Loading

Silvyre commented Oct 30, 2019

Silvyre commented Oct 31, 2019 • edited Loading

Silvyre commented Oct 31, 2019

Tyriar commented Oct 31, 2019

jerch commented Oct 31, 2019

JasinYip commented Oct 15, 2020

JasinYip commented Jan 18, 2021

Tyriar commented Jan 19, 2021

JasinYip commented Aug 13, 2021

Tyriar commented Aug 13, 2021 • edited Loading

jerch commented Sep 24, 2018 •

edited

Loading

jerch commented Oct 29, 2019 •

edited

Loading

Silvyre commented Oct 30, 2019 •

edited

Loading

Silvyre commented Oct 30, 2019 •

edited

Loading

Silvyre commented Oct 30, 2019 •

edited

Loading

Silvyre commented Oct 30, 2019 •

edited

Loading

Silvyre commented Oct 30, 2019 •

edited

Loading

jerch commented Oct 30, 2019 •

edited

Loading

Silvyre commented Oct 31, 2019 •

edited

Loading

Tyriar commented Aug 13, 2021 •

edited

Loading