Add support for rendering CJK glyphs top-to-bottom #3402

lucaswoj · 2016-10-18T18:01:41Z

fixes #1246
see also mapbox/mapbox-gl-native#1682

Requirements

GL JS must render CJK glyphs top-to-bottom along vertical lines, as is conventional in cartographic design

(this PR does not add support for top-to-bottom point labels or mixed orientation glyphs within a single label)

Specifications

we will use naïve "balanced" breaking
we will enable / disable top-to-bottom labels based on language detection
we will use character ranges for language detection (data)
if a single label has mixed CJK/non-CJK glyphs, we will determine the appropriate glyph orientation.

Launch Checklist

…ontal labels. Upright positioning broken. Curved vertical labels still broken. Glyph positioning of vertical labels still broken.

Conflicts: js/data/bucket/symbol_bucket.js

1ec5 · 2016-10-18T20:14:13Z

js/data/bucket/symbol_bucket.js

-                    lineHeight, horizontalAlign, verticalAlign, justify, spacing, textOffset);
+                    lineHeight, horizontalAlign, verticalAlign, justify, spacing, textOffset, oneEm, verticalOrientation);
+
+            if (layout['text-rotation-alignment'] === 'map' && layout['symbol-placement'] === 'line') {


To avoid verticalizing non-CJK* text, match textFeatures[k] against this regular expression, courtesy of Wiktionary:

&& !/[^ᄀ-ᇿ가-힣ㄱ-ㆎ一-鿌㐀-䶵　-〿𠀀-𬺯！-￮ぁ-ゟ゠-ヿㇰ-ㇿꀀ-꓆᠀-ᢪ]/.exec(textFeatures[k])

_{* For the purpose of this PR, “CJK” is Hangul, Hanzi, Hiragana, Katakana, Mongolian, and Yi scripts. However, note that Hangul and Mongolian words are delimited by spaces and thus should retain the Latin-style line breaking algorithm.}

(You’ll need to make the regular expression a little more lenient to allow numerals and punctuation.)

JavaScript can’t handle characters above U+FFFF in character classes – like emoji! 😛 – so the regular expression will need to be a tad more complicated to detect Hanzi from 𠀀 onwards. Specifically, we’ll need to capture anything from \uD840\uDC00 to \uD873\uDEAF, inclusive. (Here’s a very handy tool for calculating surrogate pairs.) My weary Tuesday-evening eyes aren’t helping me come with the correct regex.

@1ec5 thanks, will look at this tomorrow.

@1ec5 Can we use plain ol' inequalities rather than a regex?

Sure, whichever is easier to maintain and more performant. It probably doesn’t make a big difference either way.

Character classes can contain \uxxxx character references instead of Unicode literals, if that’s your concern. The surrogate pair issue remains because of JavaScript’s string encoding.

Note that since we switched to Buble, we can now use u RegExp flag that allows using any unicode characters in regexps (they get transpiled to \u....).

But Babel doesn’t change the fact that JavaScript strings (and therefore regular expressions) must represent characters above U+FFFF as surrogate pairs.

lucaswoj · 2016-10-18T20:36:30Z

js/symbol/shaping.js

-    0xb7:   true, // middle dot
-    0x200b: true, // zero-width space
-    0x2010: true, // hyphen
-    0x2013: true  // en dash


We need to re-add these ☝️ to the breakable lookup

1ec5 · 2016-10-18T22:36:55Z

support automatically enabling / disabling vertical labels based on language detection

FYI, we don’t need language detection, only script detection. (As far as I can tell, there isn’t a major script that was traditionally written vertically for one language but never vertically for another.) So looking at Unicode codepoints is sufficient for this task.

lucaswoj · 2016-10-18T23:13:05Z

Some notes as I try to grock the symbol placement code in GL JS

Glossary

Classes & Interfaces

`Anchor`

The geographic coordiante and, optionally, line segment that determines the position of a symbol.

interface Anchor {
    x: number;
    y: number
    angle: number;
    segment: ?number; // ?
}

`Glyph`

The bitmap and dimensions of a particular character in a particular font.

interface Glyph {
    id: number;
    bitmap: any; // ?
    width: number;
    height: number;
    left: number;
    top: number;
    advance: number; // glyph-specific right-edge padding
}

`CollisionBox`

A rectangular area of the map that is covered by a source feature. Each source feature may have multiple CollisionBoxes.

interface CollisionBox {
    anchorPointX: number;
    anchorPointY: number;
    x1: number;
    y1: number;
    x2: number;
    y2: number;
    maxScale: number;
    featureIndex: number;
    sourceLayerIndex: number;
    bucketIndex: number;
    bbox0: number;
    bbox1: number;
    bbox2: number;
    bbox3: number;
    placementScale: number;
}

`CollisionFeature`

The set of all collision boxes for a source feature.

interface CollisionFeature {
    boxStartIndex: number;
    boxEndIndex: number;
    boxes: any; // ?
}

`CollisionTile`

The set of all CollisionFeatures for a map tile

interface CollisionTile {
    grid: GridIndex;
    ignoredGrid: GridIndex;
    angle: number;
    pitch: number;
    rotationMatrix: [number, number, number, number];
    reverseRotationMatrix: [number, number, number, number];
    yStretch: number;
    collisionBoxArray: StructArray<CollisionBox>;
    tempCollisionBox: CollisionBox;
    edges: [CollisionBox, CollisionBox, CollisionBox, CollisionBox];
    minScale: number;
}

`SymbolQuad` in `SymbolQuadsArray`

interface SymbolQuad {
    anchorPointX: number;
    anchorPointY: number;
    tlX: number;
    tlY: number;
    trX: number;
    trY: number;
    blX: number;
    blY: number;
    brX: number;
    brY: number;
    texH: number;
    texW: number;
    texX: number;
    texY: number;
    anchorAngle: number;
    glyphAngle: number;
    maxScale: number;
    minScale: number;
}

`SymbolQuad` otherwise

An icon or glpyh, its coordinates, and its size for rendering

interface SymbolQuad {
    anchorPoint: Point; // ?
    tl: Point;
    tr: Point;
    bl: Point;
    br: Point;
    tex: Object; // ?
    anchorAngle: number;
    glyphAngle: number;
    minScale: number;
    maxScale: number;
}

`Shaping`

A collection of positioned glyphs and their position on screen. Contains multiple orientations of each glyph. The best orientation for the current map bearing is chosen at render time.

interface Shaping {
    positionedGlyphs: Array<PositionedGlyph>;
    text: string;
    top: number;
    bottom: number;
    left: number;
    right: number;
}

`PositionedGlyph`

interface PositionedGlyph {
    codePoint: number;
    x: number;
    y: number;
    glyph: Glyph;
}

`PositionedIcon`

interface PositionedIcon {
    image: any; // ?
    top: numbr;
    bottom: numbr;
    left: numbr;
    right: numbr;
}

`SymbolBucket`

The collision tile, symbol quads, and symbol instances for a map tile

interface SymbolBucket {
    :grimacing:
}

`SymbolInstance`

interface SymbolInstance {
    textBoxStartIndex: number;
    textBoxEndIndex: number;
    iconBoxStartIndex: number;
    iconBoxEndIndex: number;
    glyphQuadStartIndex: number;
    glyphQuadEndIndex: number;
    iconQuadStartIndex: number;
    iconQuadEndIndex: number;
    anchorPointX: number;
    anchorPointY: number;
    index: number;
}

Misc Terminology

box scale zoom-specific scaling factor used to convert between glyph units and geometry units
CJK: acronym for "Chinese, Japanese, and Korean", the three languages that require top-to-bottom orientation
leading: distance between the baselines of subsequent lines of text (see also advance)
orientation: the direction in which a text is rendered: top-to-bottom (CJK-only) or left-to-right
shaped icon: a Shaping containing an icon
shaped text: a Shaping containing glyphs
text feature: the text string associated with a particular feature

1ec5 · 2016-10-18T23:19:49Z

CJK: acronym for "Chinese, Japanese, and Korean", the three languages that require top-to-bottom orientation

Crossposted from #3402 (comment): For the purpose of this PR, “CJK” is Hangul, Hanzi/Hanja/Kanji, Hiragana, Katakana, Mongolian, and Yi scripts (roughly corresponding to the Chinese, Japanese, Korean, Mongolian, and Yi languages). However, note that Hangul and Mongolian words are delimited by spaces and thus should retain the Latin-style line breaking algorithm. If vertical, space-delimited Hangul and Mongolian is a problem, they can be horizontal for now.

friedbunny · 2016-10-18T23:41:14Z

Per chat w/@1ec5, for Japanese it would probably be reasonable to exclude names that include romaji (roman characters) from verticalization. It can be done, but horizontal seems to be generally preferred (especially if a name only contains romaji).

We’ll have to contend with fullwidth variants, as well.

1ec5 · 2016-10-19T00:04:12Z

GL JS must render CJK glyphs top-to-bottom when appropriate

#3402 (comment) addresses horizontal/vertical switching based on scripts.

Even in the context of a Chinese-only map, laying out all labels vertically only makes sense for an archaic-looking style. (Think yellowed background and calligraphic fonts.) A modern-looking style would typically lay out point-placed labels horizontally but fall back to a vertical layout to avoid collision. Meanwhile, line-placed labels would be laid out horizontally or vertically based on the angle of the road, in an attempt to avoid rotating glyphs beyond 45°. (See mapbox/mapbox-gl-native#1682 for further discussion.)

If there’s a need to land this feature before handling those nuances, I recommend placing vertical layout behind a style specification property such as writing-mode: traditional. Per-character line breaking (i.e., word-break: break-all), as described in mapbox/mapbox-gl-native#1223, remains the highest priority for general-purpose Chinese text, beyond vertical text fallback.

nickidlugash · 2016-10-19T00:51:20Z

Even in the context of a Chinese-only map, laying out all labels vertically only makes sense for an archaic-looking style. (Think yellowed background and calligraphic fonts.) A modern-looking style would typically lay out point-placed labels horizontally but fall back to a vertical layout to avoid collision.

We are not implementing vertical labels for point placement.

Meanwhile, line-placed labels would be laid our horizontally or vertically based on the angle of the road, in an attempt to avoid rotating glyphs beyond 45°.

Yes, this is what this PR does.

nickidlugash · 2016-10-19T07:31:51Z

js/data/bucket/symbol_bucket.js

-                    continue;
-                 // ^^^ this check is where all the vertical labels are being skipped    
-                }
-            }*/


@lucaswoj this is a check that I temporarily commented out because it was behaving strangely with vertical labels, but should be kept.

Thanks for the heads up! I'll restore this in a sec.

lucaswoj · 2016-10-21T20:32:43Z

Rebased and debugged 👉 #3438

xrwang and others added 13 commits October 14, 2016 16:00

cjk specific breaking

1c950ee

break evenly for now

cefde62

remember all the fixes

0008ca4

wrap several times on super long

b07e503

no unused vars

b63fa24

update expected linebreaking

fe6bbb8

cjk specific breaking

006dfa8

break evenly for now

46f1d71

remember all the fixes

957edbd

started new vertical label implementation and debugging

2b73762

more work on vertical labels

6c8135c

refactor vertical labels to be part of same symbol instances as horiz…

a7dd764

…ontal labels. Upright positioning broken. Curved vertical labels still broken. Glyph positioning of vertical labels still broken.

updated checks for dropping vertical label glyphs

8012238

Conflicts: js/data/bucket/symbol_bucket.js

lucaswoj assigned lucaswoj, xrwang and nickidlugash Oct 18, 2016

Remove dead code, commented code, & whitespace changes

1af4500

lucaswoj changed the title ~~Add support for orienting CJK glyphs along north-south lines vertically~~ Add support for rendering CJK glyphs top-to-bottom along north-south lines Oct 18, 2016

1ec5 suggested changes Oct 18, 2016

View reviewed changes

1ec5 changed the title ~~Add support for rendering CJK glyphs top-to-bottom along north-south lines~~ Add support for rendering CJK glyphs top-to-bottom Oct 18, 2016

lucaswoj commented Oct 18, 2016

View reviewed changes

lucaswoj added the not ready for review label Oct 18, 2016

nickidlugash reviewed Oct 19, 2016

View reviewed changes

jfirebaugh mentioned this pull request Oct 19, 2016

Use ES6 class syntax #3408

Merged

77 tasks

Lucas Wojciechowski added 2 commits October 19, 2016 11:40

Restore commented out code.

d5f03c5

Organize breakable array

c1e0dac

lucaswoj force-pushed the cjk-vertical-labels-2 branch from 6b4fb56 to c1e0dac Compare October 19, 2016 19:04

Lucas Wojciechowski added 5 commits October 19, 2016 12:41

Misc refactoring of SymbolBucket

bbb22c3

Fix linter errors

598985a

Refactor shaping module for better clarity

af45d54

Rename "placeText" to "shapeText"

24ac760

Rename "quads" to "placeShapedText" / "placeShapedIcons"

defb6ef

xrwang mentioned this pull request Oct 20, 2016

Add support for ideographic text breaking #3420

Merged

6 tasks

Lucas Wojciechowski added 5 commits October 20, 2016 12:45

Build debugging rig

2711569

More better debugging page

c82d22b

Remove line breaking code

9953d5c

Rename SymbolQuadsArray to PlacedSymbolArray

2f3f5ee

Got placement roughly working

b3b65d2

lucaswoj mentioned this pull request Oct 21, 2016

Add support for rendering CJK in a vertical writing mode along line-placed features #3438

Merged

16 tasks

lucaswoj closed this Oct 21, 2016

lucaswoj deleted the cjk-vertical-labels-2 branch October 21, 2016 20:32

lucaswoj restored the cjk-vertical-labels-2 branch October 21, 2016 20:32

jfirebaugh deleted the cjk-vertical-labels-2 branch February 3, 2017 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for rendering CJK glyphs top-to-bottom #3402

Add support for rendering CJK glyphs top-to-bottom #3402

lucaswoj commented Oct 18, 2016 •

edited

Loading

1ec5 Oct 18, 2016 •

edited

Loading

1ec5 Oct 18, 2016

1ec5 Oct 18, 2016 •

edited

Loading

nickidlugash Oct 19, 2016

lucaswoj Oct 19, 2016

1ec5 Oct 19, 2016 •

edited

Loading

mourner Oct 20, 2016

1ec5 Oct 20, 2016

lucaswoj Oct 18, 2016

1ec5 commented Oct 18, 2016

lucaswoj commented Oct 18, 2016 •

edited

Loading

1ec5 commented Oct 18, 2016 •

edited

Loading

friedbunny commented Oct 18, 2016

1ec5 commented Oct 19, 2016 •

edited

Loading

nickidlugash commented Oct 19, 2016

nickidlugash Oct 19, 2016

lucaswoj Oct 19, 2016

lucaswoj commented Oct 21, 2016

Add support for rendering CJK glyphs top-to-bottom #3402

Add support for rendering CJK glyphs top-to-bottom #3402

Conversation

lucaswoj commented Oct 18, 2016 • edited Loading

Requirements

Specifications

Launch Checklist

1ec5 Oct 18, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

1ec5 Oct 18, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

1ec5 Oct 19, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

1ec5 commented Oct 18, 2016

lucaswoj commented Oct 18, 2016 • edited Loading

Glossary

Classes & Interfaces

Anchor

Glyph

CollisionBox

CollisionFeature

CollisionTile

SymbolQuad in SymbolQuadsArray

SymbolQuad otherwise

Shaping

PositionedGlyph

PositionedIcon

SymbolBucket

SymbolInstance

Misc Terminology

1ec5 commented Oct 18, 2016 • edited Loading

friedbunny commented Oct 18, 2016

1ec5 commented Oct 19, 2016 • edited Loading

nickidlugash commented Oct 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucaswoj commented Oct 21, 2016

lucaswoj commented Oct 18, 2016 •

edited

Loading

1ec5 Oct 18, 2016 •

edited

Loading

1ec5 Oct 18, 2016 •

edited

Loading

1ec5 Oct 19, 2016 •

edited

Loading

lucaswoj commented Oct 18, 2016 •

edited

Loading

`Anchor`

`Glyph`

`CollisionBox`

`CollisionFeature`

`CollisionTile`

`SymbolQuad` in `SymbolQuadsArray`

`SymbolQuad` otherwise

`Shaping`

`PositionedGlyph`

`PositionedIcon`

`SymbolBucket`

`SymbolInstance`

1ec5 commented Oct 18, 2016 •

edited

Loading

1ec5 commented Oct 19, 2016 •

edited

Loading