-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for rendering CJK glyphs top-to-bottom #3402
Conversation
…ontal labels. Upright positioning broken. Curved vertical labels still broken. Glyph positioning of vertical labels still broken.
Conflicts: js/data/bucket/symbol_bucket.js
lineHeight, horizontalAlign, verticalAlign, justify, spacing, textOffset); | ||
lineHeight, horizontalAlign, verticalAlign, justify, spacing, textOffset, oneEm, verticalOrientation); | ||
|
||
if (layout['text-rotation-alignment'] === 'map' && layout['symbol-placement'] === 'line') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid verticalizing non-CJK* text, match textFeatures[k]
against this regular expression, courtesy of Wiktionary:
&& !/[^ᄀ-ᇿ가-힣ㄱ-ㆎ一-鿌㐀-䶵 -〿𠀀-!-○ぁ-ゟ゠-ヿㇰ-ㇿꀀ-꓆᠀-ᢪ]/.exec(textFeatures[k])
* For the purpose of this PR, “CJK” is Hangul, Hanzi, Hiragana, Katakana, Mongolian, and Yi scripts. However, note that Hangul and Mongolian words are delimited by spaces and thus should retain the Latin-style line breaking algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(You’ll need to make the regular expression a little more lenient to allow numerals and punctuation.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JavaScript can’t handle characters above U+FFFF in character classes – like emoji! 😛 – so the regular expression will need to be a tad more complicated to detect Hanzi from 𠀀 onwards. Specifically, we’ll need to capture anything from \uD840\uDC00
to \uD873\uDEAF
, inclusive. (Here’s a very handy tool for calculating surrogate pairs.) My weary Tuesday-evening eyes aren’t helping me come with the correct regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@1ec5 thanks, will look at this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@1ec5 Can we use plain ol' inequalities rather than a regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, whichever is easier to maintain and more performant. It probably doesn’t make a big difference either way.
Character classes can contain \uxxxx
character references instead of Unicode literals, if that’s your concern. The surrogate pair issue remains because of JavaScript’s string encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that since we switched to Buble, we can now use u
RegExp
flag that allows using any unicode characters in regexps (they get transpiled to \u....
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But Babel doesn’t change the fact that JavaScript strings (and therefore regular expressions) must represent characters above U+FFFF as surrogate pairs.
0xb7: true, // middle dot | ||
0x200b: true, // zero-width space | ||
0x2010: true, // hyphen | ||
0x2013: true // en dash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to re-add these ☝️ to the breakable
lookup
FYI, we don’t need language detection, only script detection. (As far as I can tell, there isn’t a major script that was traditionally written vertically for one language but never vertically for another.) So looking at Unicode codepoints is sufficient for this task. |
Some notes as I try to grock the symbol placement code in GL JS GlossaryClasses & Interfaces
|
Crossposted from #3402 (comment): For the purpose of this PR, “CJK” is Hangul, Hanzi/Hanja/Kanji, Hiragana, Katakana, Mongolian, and Yi scripts (roughly corresponding to the Chinese, Japanese, Korean, Mongolian, and Yi languages). However, note that Hangul and Mongolian words are delimited by spaces and thus should retain the Latin-style line breaking algorithm. If vertical, space-delimited Hangul and Mongolian is a problem, they can be horizontal for now. |
Per chat w/@1ec5, for Japanese it would probably be reasonable to exclude names that include romaji (roman characters) from verticalization. It can be done, but horizontal seems to be generally preferred (especially if a name only contains romaji). We’ll have to contend with fullwidth variants, as well. |
#3402 (comment) addresses horizontal/vertical switching based on scripts. Even in the context of a Chinese-only map, laying out all labels vertically only makes sense for an archaic-looking style. (Think yellowed background and calligraphic fonts.) A modern-looking style would typically lay out point-placed labels horizontally but fall back to a vertical layout to avoid collision. Meanwhile, line-placed labels would be laid out horizontally or vertically based on the angle of the road, in an attempt to avoid rotating glyphs beyond 45°. (See mapbox/mapbox-gl-native#1682 for further discussion.) If there’s a need to land this feature before handling those nuances, I recommend placing vertical layout behind a style specification property such as |
We are not implementing vertical labels for point placement.
Yes, this is what this PR does. |
continue; | ||
// ^^^ this check is where all the vertical labels are being skipped | ||
} | ||
}*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucaswoj this is a check that I temporarily commented out because it was behaving strangely with vertical labels, but should be kept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up! I'll restore this in a sec.
6b4fb56
to
c1e0dac
Compare
Rebased and debugged 👉 #3438 |
fixes #1246
see also mapbox/mapbox-gl-native#1682
Requirements
(this PR does not add support for top-to-bottom point labels or mixed orientation glyphs within a single label)
Specifications
Launch Checklist