-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search Addon: Fix length calculation of wide unicode chars #3236
Conversation
81cdaca
to
a57ee6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for looking into this.
Some remarks from my side:
- Could you move the changes to a separate branch? That would make testing of the PR alot easier (it is currently on master 😨).
- Imho this still does not work correctly, repro:
- input
¥¥¥hello ¥¥¥hello##########café éé ééécafcaféééééécafcaféééééécaf𝄞𝄞𝄞𝄞𝄞𝄞𝄞𝄞𝄞𝄞𝄞ééhello
into the terminal - search for
¥
,é
andhello
- result: not all occurances found, some selections are totally off
- input
Imho this needs a more fundamental rewrite - one way to approach it would be to get rid of the regexp positioning at all and search only on buffer indices. This would eliminate the string to buffer position errors. It is abit more implementation work but should run fine perfwise as well.
Another way is to stick with initial regexp search and also to translate the beginning string offset into a buffer index. This is needed as only the very first cell of a new line is guaranteed to have the same index position in the string and the buffer (index 0). Note that this also does not apply to wrapped lines (you basically have to start the buffer index translation always from the very first char of a wrapped line, which might span several buffer rows).
Sorry for the late response. I don't think it's a good idea to get rid of the regexp positioning because in that case we can hardly search by user provided regexp. But I noticed that searching in wrapped line does not work if the line contains special chars like emojis, because the
I'm wondering if the |
Sorry for that, but it seems not possible to change branch name now. I can create another PR if you want. |
@gera2ld Yes, removing regexp would be a big loss in flexibility, its better to find a solution that keeps working with them.
Hmm looking at
Esp. the last point is tricky to get done right, as any non 1:1 mapping char will create offsets - combining chars on BMP chars will create +1 offsets on the string index, while BMP wide chars will create -1 offsets (relative to buffer index). The only fixpoint in both metrics we have are real new lines at index 0 (lines that have no following Idea to revamp
Hmm I think changing a PRs source branch is not possible. Guess creating a new one is the only option. |
Update:
|
Any news for this? 😢 |
@jerch can you take another look at this if you have the time? The issue seems to be fixed when testing and looks like the tests pass. Any other comments? |
@Tyriar Yes, will see, if I can look at it this week. The problem I had was the PR pulling because of its branch name, but I can do the testing on a separate repo clone. |
Please review someone else, as it seems I wont get down to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gera2ld and sorry about how long this took to merge in. I made a few minor doc related changes but since the tests pass and changes seem reasonable lgtm.
This is the best news for me for Christmas and the New year, thank you all you guys fix this! |
Fix #1686