Rebuild ImFontAtlas::GetGlyphRangesJapanese offset table #3627
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello!
This PR makes
ImFontAtlas::GetGlyphRangesJapanese()
support more Japanese characters (Kanjis defined by the Government of Japan) out of the box.What this PR do
The commit 0e6b84c rebuilds internal offset table in
ImFontAtlas::GetGlyphRangesJapanese
.Source of the offset table
As a reliable source of this offset table, I chose the character information database of the Information-technology Promotion Agency (IPA, an administrative entity of Japan).
IPA provides REST API to access their database https://mojikiban.ipa.go.jp/mji/ .
The information acquired from the database is freely available under the terms of Creative Commons Attribution-ShareAlike 2.1 Japan (CC BY-SA 2.1 JP).
Supplemental scripts
I made a repository https://github.com/vaiorabbit/everyday_use_kanji that contains several Ruby scripts to
GetGlyphRangesJapanese()
implementation (e.g. https://github.com/vaiorabbit/everyday_use_kanji/blob/master/imgui/GetGlyphRangesJapanese.cpp ).These scripts will be useful when we want to keep
GetGlyphRangesJapanese()
up-to-date in the future.Motivation
Click here to expand
Current
GetGlyphRangesJapanese()
implementation supports 1946 characters, but this is not enough to support 2136 Joyo (common-use) characters and 863 Jinmeiyo (for personal names) characters, which are defined by the Government of Japan).So we often see garbled characters in relatively simple Japanese sentences and people's names
(displayed by the replacement character ("?") as a fallback in this screenshot).
Though Sometimes GetGlyphRangesChineseFull is recommended as a replacement,
GetGlyphRangesChineseFull()
tends to produce texture larger than that ofGetGlyphRangesJapanese
. Though it would depend on the configuration,GetGlyphRangesChineseFull
produces 4096 x 4096 font texture internally, which is quite large compared toGetGlyphRangesJapanese()
implementation, which produces only 1024 x 2048 texture.GetGlyphRangesChineseFull
) displayed in RenderDocGetGlyphRangesJapanese
) displayed in RenderDocThere is another alternative called GetGlyphRangesChineseSimplifiedCommon that supports 2500 characters,
I thought it would be easy and reasonable to rebuild the internal tables in
GetGlyphRangesJapanese()
to support Japanese characters defined by the government.Limitations
What you will about to read below is a topic that is difficult even for the Japanese people. But I will try to explain it somehow.
In short:
Limitation and workaround due to the code point of "𠮟"
Limitation and workaround due to the code point of "𠮟"
In a commit in the previous similar PR ( #1650 ), we can see a line that says:
This means the character corresponding to the code point 0x20B9F(==134047) exceeds the range of 2-byte variable (short or ImWchar16) so cannot be displayed.
The actual character is "𠮟" (scold, rebuke or reprimand, etc.).
"𠮟" still can cause garbled character. When we try to use "𠮟" in Windows, Microsoft's standard Japanese IME displays attention "環境依存(environment-dependent)", that means "this character may cause garbled characters because there are several environments that cannot handle this character code".
So, this character is often substituted by the variant character "叱" (U+53F1).
Actual history of this problem is a bit more complex, but in terms of actual use cases, these two characters can be recognized as the same character, differing only in design.
So in this PR, I intentionally used "叱 (U+53F1)" at everywhere "𠮟 (u+20B9F)" should come but unusable.
static const short accumulative_offsets_from_0x4E00[]
.GetGlyphRangesJapanese()
. In this list, as a workaround the character "𠮟 (U+20B9F, modern form)" is replaced with "叱 (U+53F1, traditional form)" to represent all characters in 2 bytes.Even after this PR was merged,
GetGlyphRangesJapanese()
can display "叱" (U+53F1) but cannot display "𠮟 (u+20B9F)".Users who want to display "𠮟 (modern form)" should follow these steps:
Build ImGui with IMGUI_USE_WCHAR32 enabled
Prepare appropriate font (e.g. Google Noto Fonts)
Write codes like:
References
Test and Performance
I made a small test code that tries to display all 2136 Joyo characters and 863 Jinmeiyo characters.
Screenshots
Screenshot (/w current GetGlyphRangesJapanese(), IMGUI_USE_WCHAR32 disabled)
Screenshot (/w new GetGlyphRangesJapanese(), IMGUI_USE_WCHAR32 disabled)
Screenshot (/w new GetGlyphRangesJapanese(), enable IMGUI_USE_WCHAR32 and use ImFontGlyphRangesBuilder::AddText)
builder.AddText(u8"𠮟")
solves the problem and we're done!Performance issue
Size of font texture
Though it would depend on the configuration, both current
GetGlyphRangesJapanese()
and new implementation created 1024x2048 font texture internally in the test code. The increase in texture size was not so great.Memory consumption
The test code reports memory consumption by ImGui when the macro MEASURE_MEMORY_ALLOCATION is defined
(by using the allocator hooks provided by ImGui::SetAllocatorFunctions).
The increase in memory consumption due to the new implementation is less than 100K Bytes.