Rendering issues with UTF-8 text that has been decomposed into a canonical form #6036

skylersaleh · 2022-12-31T21:06:54Z

Hello,

I've been having some issues getting UTF-8 strings from filenames that have been decomposed by the operating system(https://developer.apple.com/library/archive/technotes/tn/tn1150.html#UnicodeSubtleties) to display correctly in ImGui.

Notice the top item has "?" marks after accented letters and the Korean letters are decomposed into their composite radicals. Compared to the bottom string which is the same except in non-decomposed UTF-8.

The attached txt file shows the same UTF-8 string encoded in both of these ways, however only one of them will render correctly in ImGui.

utf8.txt

I believe this might be caused by ImGui not compounding these multi-codepoint characters into a single character and instead tries to render each codepoint separately. Is there anyway to resolve this?

Thanks,

-Sky

PathogenDavid · 2022-12-31T22:52:40Z

Hello!

Dear ImGui's text shaping is extremely simple and doesn't handle composite characters like those. (See also #4227 #4922 #1228 #4943)

The easiest solution is probably to "recompose" the strings.

In the context of your app that might be as simple as performing the opposite replacements from the table linked by the documentation you shared. Although the phrasing implies that table doesn't include Hangul, so that might take extra effort.

Ideally though you might find a macOS API or a library to handle this. If you find one please let us know!

ocornut · 2023-01-03T18:27:32Z

Agree with David answer. It is not expected that Dear ImGui will handle that, for performance and complexity reason, but you can probably preprocess your strings with some library function. I'm not even sure what the transformation is called. Maybe this?
https://www.gnu.org/software/libunistring/manual/libunistring.html#Normalization-of-strings
https://www.gnu.org/software/libunistring/manual/libunistring.html#Composition-of-characters

I'll close this as it is out of scope, however if you solve your issue posting an answer here for reference would be helpful for others. Thank you!

skylersaleh · 2023-01-03T19:01:10Z

I ended up using the string conversion to UTF-8 NFC in the UTF8proc library (https://github.com/JuliaStrings/utf8proc), and it did the trick here.

It would be nice if a similar routine was implemented in imgui so that a third party library was not necessary.

ocornut · 2023-01-03T19:06:01Z

It would be nice if a similar routine was implemented in imgui so that a third party library was not necessary.

Unfortunately even that tiny utf8proc carry a 1.8 MB source file with large tables. Not sure how much binary data that account for but it's probably too large but such an unusual use case.

ocornut added the font/text label Jan 1, 2023

ocornut closed this as completed Jan 3, 2023

PathogenDavid mentioned this issue Mar 3, 2024

Rendering U+0305 #7362

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rendering issues with UTF-8 text that has been decomposed into a canonical form #6036

Rendering issues with UTF-8 text that has been decomposed into a canonical form #6036

skylersaleh commented Dec 31, 2022 •

edited

Loading

PathogenDavid commented Dec 31, 2022

ocornut commented Jan 3, 2023

skylersaleh commented Jan 3, 2023

ocornut commented Jan 3, 2023

Rendering issues with UTF-8 text that has been decomposed into a canonical form #6036

Rendering issues with UTF-8 text that has been decomposed into a canonical form #6036

Comments

skylersaleh commented Dec 31, 2022 • edited Loading

PathogenDavid commented Dec 31, 2022

ocornut commented Jan 3, 2023

skylersaleh commented Jan 3, 2023

ocornut commented Jan 3, 2023

skylersaleh commented Dec 31, 2022 •

edited

Loading