-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add native UTF8 support for InputText and remove ImWchar buffer #7925
Conversation
Thank you very much. This looks pretty good and I will try to review it shortly. I have already rebased the branch locally to look at it. It's been indeed the much needed first step toward a fuller refactor of The truth is, rokups has done this work already but as part of a larger refactor. |
ea012fe
to
b9a7042
Compare
Here are some changes I made:
// With our UTF-8 use of stb_textedit:
// - STB_TEXTEDIT_GETCHAR is nothing more than a a "GETBYTE". It's only used to compare to ascii or to copy blocks of text so we are fine.
// - One exception is the STB_TEXTEDIT_IS_SPACE feature which would expect a full char in order to handle full-width space such as 0x3000 (see ImCharIsBlankW).
// - ...but we don't use that feature.
This needs further works:
|
e91acc2
to
82ed98d
Compare
…tibyte characters. for ocornut/imgui#7925
…rl+arrow on multibyte characters. for ocornut/imgui#7925
I think I have fixed all the things except If for some reason you need to run or modify tests, the modified test is in a public branch: https://github.com/ocornut/imgui_test_engine/tree/features/input_text_7925 About // Find the shortest single replacement we can make to get the new text from the old text.
// Important: needs to be run before TextW is rewritten with the new characters because calling STB_TEXTEDIT_GETCHAR() at the end.
// FIXME: Ideally we should transition toward (1) making InsertChars()/DeleteChars() update undo-stack (2) discourage (and keep reconcile) or obsolete (and remove reconcile) accessing buffer directly. The part about needing to run before TextW is rewritten is essentially broken now, for the good reason that we were using this buffer as an "old buffer". Solving this the same way would annoyingly requires us to make a copy of the full text before running the callback. aka possibly worse case every frame since the Always callback is technically allowed to modify character... PLAN A
That seems all possible to implement. The slight annoyance is that it means multiple calls to e.g. InsertChars() would lead to multiple undo steps, so that's a little bit of a regression now. It feels nicer if that's not the case... PLAN B
PLAN C
|
First of all, I am very glad my PR is helpful even if it ended up requiring more work on your end than I expected.
I was somehow thinking, I only want to handle correctly formed byte sequences when in reality ImTextFindPreviousUtf8Codepoint is not only much simpler, it also behaves better for ill formed sequences.
Indeed, I must have missed that.
I believe the functions do not technically have to handle that directly since
I'll see what I can do For everything else I will have to get back to you next week, sorry. |
It definitively broke a few of the new test cases I added (basic testing of prev/next words on mulit-byte character). I have rebased the branch today following unrelated changes which conflicted dearimgui/dear_bindings#47 (comment) |
2dedbf2
to
ce5f7ab
Compare
Correct me if I'm wrong but we have been copying/converting on every frame before as well, right?
Not saying that we should not strive to be better but if we are not worse than before I'm almost tempted to go with plan C for now. This could still be seen as a win:
That being said, being unsatisfied with the unnecessary copies is the main reason I started this PR, so I'm not too happy with that solution either.
You're right. It generally worked but some nuances in regard to whitespaces (should the cursor end up before or after the whitespace) were behaving differently. |
You are right that Plan C wouldn't be a regression and can always be a step to better. I will work on this for now. It's easy to implement but mostly I would like to implement some testing. |
Unfortunately it does because of this: (but only when there's a callback) |
ce5f7ab
to
a118e54
Compare
…text_callback_replace" with undo tests aimed to excercise reconcile code. for ocornut/imgui#7925
if (callback_data.BufTextLen > backup_current_text_length && is_resizable)
state->TextA.resize(state->TextA.Size + (callback_data.BufTextLen - backup_current_text_length)); // Worse case scenario resize
memcpy(state->TextA.Data, callback_data.Buf, callback_data.BufTextLen); As it turns out none of this is necessary anymore! Since the callback ran over our buffer anyhow. We just need to update TextA.Size.
Future improvements:
|
That's large optimization is extremely valuable, and couldn't be done before because we had to support ImWchar and they can be either 16 or 32 bytes. Quick test pasting a ~900 KB file, VS2022 x64 Debug Mode:
This quite good. I am pretty sure they are other good opportunities that will now become easier to see and take advantage of. |
… the wchar buffer. (ocornut#7925) WIP (requires subsequent commits for fixes)
It seems sensible to push this change in stb_textedit repo eventually.
…xtCalcTextSize() to use similar debug-friendly logic as ImFont:CalcTextSizeA(). misc small tidying up. (ocornut#7925)
+ replace IMSTB_TEXTEDIT_GETPREVCHARINDEX code with ImTextFindPreviousUtf8Codepoint().
…ixes character filtering. (ocornut#7925) Refer to imgui_test_suite for tests.
0020036
to
0008f3d
Compare
… the wchar buffer. (#7925) WIP (requires subsequent commits for fixes)
It seems sensible to push this change in stb_textedit repo eventually.
…xtCalcTextSize() to use similar debug-friendly logic as ImFont:CalcTextSizeA(). misc small tidying up. (#7925)
+ replace IMSTB_TEXTEDIT_GETPREVCHARINDEX code with ImTextFindPreviousUtf8Codepoint().
…ixes character filtering. (#7925) Refer to imgui_test_suite for tests.
Merged into main branch. Thanks for your help! |
Hello!🌞 |
…chr() shaves 0.2 ms on 865k multi-line text case. Approximately 20%. (#7925)
Pushed a small optimization which shaves a further 0.2 ms in debug mode on the large text case.
This seems very possible and valuable but requires a little more work than expected so I am putting that aside. |
Was harmless because TextA is always zero-terminated, and TextA.Size >= CurLenA + 1.
…vertTo. Follow the refactors in #7925.
!! Please see the notes on the docking branch issue at the bottom !!
Overview
To implement common text editing operations, Dear ImGui uses a modified version of stb_textedit under the hood. stb_textedit currently only works for fixed size character encodings (e.g. ASCII or UTF16). Since Dear ImGui uses UTF8 for its API a lot of conversion between UTF8 and UTF16 are being done under the hood.
This PR is making the necessary adjustments to stb_textedit and Dear ImGui to support native UTF8 editing and get rid of the additional UTF16 buffer and conversions.
Details
CurLenW and the TextW buffer in ImGuiInputTextState got removed completely. TextA and CurLenA are still there (I also kept the 'A' postfix. I am not sure if there is any good reason to keep it). I didn't put too much thought into it yet but it may or may not be possible to even remove the TextA buffer and get rid of copying the user buffer entirely? Even if that is the case, it might be desirable to still keep it around for potential future implementations of the
ImGuiInputTextFlags_NoLiveEdit
flag(InputText, InputInt, InputFloat: add ImGuiInputTextFlags_NoLiveEdit flag #701). Also note that the comments for CurLenA and TextA might not be entirely fitting anymore. I adjusted them a little bit but I am not sure how much of the information there is still true.
The changes to stb_textedit are for the most part based on the suggestions from this related issue. Some more changes were necessary. Including a change to stb_textedit_key that adds the decoded UTF8 character byte sequence to the parameters. This means we now have UTF8 specific code in stb_textedit.
Valid values can be passed for other encodings but it really only makes sense for UTF8. One could maybe think of a better, more encoding agnostic, solution but to be honest, I am not sure if that is worth the additional complexity. I guess it depends on how much we are willing to deviate from the original stb_textedit.
There was an outstanding
FIXME-OPT
about having to copy/convert the buffer to UTF16 for read-onlyInputText
to allow selection/cursor usage. This should now not be an issue anymore and I removed it.The test suite ran succesfully. Of course some tests had to be adjusted slightly since TextW doesn't exist anymore. I am not sure how many UTF8 specific edge cases are even tested so that may not be all too meaningful.
I did of course also do some manual testing and encountered no issues with mixes of European and Japanese glyphs.
Mouse clicking/dragging, arrow key navigation (in rows and between rows), arrow key whole word navigation, all seem to be working as expected. The callback demos in the imgui_demo work as well (except for some minor issues that have been there before e.g. an 'ü' can not be made uppercase by the barebones demo callback implementations)
Performance was not tested. As far as I can tell there isn't any more rescanning happening now than it was before. Here and there there is some forward/backwards scanning e.g. to check for multi byte sequences adjacent to the cursor but nothing crazy. If anything I'd expect a performance improvement due to all the conversions that are now gone.
There might be a lot more code that could be simplified now. I did not get too far into it. For now I would like to keep the changes manageable so they can be reviewed easier. But I do expect this to open the door for some more improvements.
NOTE: Docking branch
Due to an oversight on my end all changes were made to the
docking
branch instead ofmaster
.A low level change like this would probably be prefered to be implemented in the
master
branch and then be merged intodocking
.Pulling the changes into
master
and doing the necessary adjustments is not impossible but it's also not trivial. If necessary I would be willing to do it. For now the pull request is only for thedocking
branch. Some feedback on this would be appreciated.