Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rendering issues with UTF-8 text that has been decomposed into a canonical form #6036

Closed
skylersaleh opened this issue Dec 31, 2022 · 4 comments

Comments

@skylersaleh
Copy link

skylersaleh commented Dec 31, 2022

Hello,

I've been having some issues getting UTF-8 strings from filenames that have been decomposed by the operating system(https://developer.apple.com/library/archive/technotes/tn/tn1150.html#UnicodeSubtleties) to display correctly in ImGui.

image

Notice the top item has "?" marks after accented letters and the Korean letters are decomposed into their composite radicals. Compared to the bottom string which is the same except in non-decomposed UTF-8.

The attached txt file shows the same UTF-8 string encoded in both of these ways, however only one of them will render correctly in ImGui.

utf8.txt

I believe this might be caused by ImGui not compounding these multi-codepoint characters into a single character and instead tries to render each codepoint separately. Is there anyway to resolve this?

Thanks,

-Sky

@PathogenDavid
Copy link
Contributor

Hello!

Dear ImGui's text shaping is extremely simple and doesn't handle composite characters like those. (See also #4227 #4922 #1228 #4943)

The easiest solution is probably to "recompose" the strings.

In the context of your app that might be as simple as performing the opposite replacements from the table linked by the documentation you shared. Although the phrasing implies that table doesn't include Hangul, so that might take extra effort.

Ideally though you might find a macOS API or a library to handle this. If you find one please let us know!

@ocornut
Copy link
Owner

ocornut commented Jan 3, 2023

Agree with David answer. It is not expected that Dear ImGui will handle that, for performance and complexity reason, but you can probably preprocess your strings with some library function. I'm not even sure what the transformation is called. Maybe this?
https://www.gnu.org/software/libunistring/manual/libunistring.html#Normalization-of-strings
https://www.gnu.org/software/libunistring/manual/libunistring.html#Composition-of-characters

I'll close this as it is out of scope, however if you solve your issue posting an answer here for reference would be helpful for others. Thank you!

@ocornut ocornut closed this as completed Jan 3, 2023
@skylersaleh
Copy link
Author

I ended up using the string conversion to UTF-8 NFC in the UTF8proc library (https://github.com/JuliaStrings/utf8proc), and it did the trick here.

It would be nice if a similar routine was implemented in imgui so that a third party library was not necessary.

@ocornut
Copy link
Owner

ocornut commented Jan 3, 2023

It would be nice if a similar routine was implemented in imgui so that a third party library was not necessary.

Unfortunately even that tiny utf8proc carry a 1.8 MB source file with large tables. Not sure how much binary data that account for but it's probably too large but such an unusual use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants