Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some character combinations aren't rendered #3050

Closed
MaxSem opened this issue Aug 28, 2015 · 16 comments
Closed

Some character combinations aren't rendered #3050

MaxSem opened this issue Aug 28, 2015 · 16 comments
Assignees
Milestone

Comments

@MaxSem
Copy link

MaxSem commented Aug 28, 2015

map
Minimum repro: "א-" - that's Hebrew letter aleph and minus sign. This is reproduceable at least with Hebrew, Arabic and Georgian characters, if appropriate fonts are present in fallback chain.

In the sample attached, aleph is not properly rendered in combination with minus sign while it has no problems with being rendered alone or combined with Latin (and that's the same Unicode range as minus) or Chinese characters.

Full XML to reproduce this bug, along with fonts used and C++ and Node.js examples is at https://github.com/MaxSem/mapnik-bug

@springmeyer springmeyer added this to the Mapnik 3.0.5 milestone Sep 10, 2015
@artemp
Copy link
Member

artemp commented Sep 10, 2015

Debugging reveals that call to ubidi_getDirection(bidi); in itemizer.cpp returns UBIDI_MIXED for aא+ input
This results in three text_items (correctly)

text_item:0,1 // LTR
text_item:1,2 // RTL
text_item:2,3 // LTR

and UBIDI_RTL -> only one item for א+

text_item:0,2 // RTL

@MaxSem - I think the problem is that 'ICU itemizer' sets RTL direction to the whole run even there is a neutral char '+' or '-' etc. I'm not sure if this is expected and we need look at harfbuzz shaping or? Your feedback would be super useful as I'm not an expert on RTL languages.

/cc @springmeyer

@artemp
Copy link
Member

artemp commented Sep 10, 2015

@MaxSem
Copy link
Author

MaxSem commented Sep 10, 2015

I don't know anything about it either, but I have people around who do:) My colleague says that the second example is correct too.

@artemp
Copy link
Member

artemp commented Sep 15, 2015

ICU has

ubidi_setReorderingMode(..);
ubidi_setReorderingOptions(..);

Passing various option combinations ^ result in "א-" rendered or not. But I don't know if we should be using non-default options or .. I feel we need an expert advice here :)

/cc @springmeyer @behdad @MaxSem

@artemp artemp modified the milestones: Mapnik 3.1.0, Mapnik 3.0.5 Sep 16, 2015
@behdad
Copy link

behdad commented Sep 29, 2015

I have no clue what this bug is about, but playing with bidi settings sounds like the wrong approach. It's definitely a bug in your (mapnik) code.

@behdad
Copy link

behdad commented Sep 29, 2015

Debugging reveals that call to ubidi_getDirection(bidi); in itemizer.cpp returns UBIDI_MIXED for aא+ input
This results in three text_items (correctly)

text_item:0,1 // LTR
text_item:1,2 // RTL
text_item:2,3 // LTR

and UBIDI_RTL -> only one item for א+

text_item:0,2 // RTL

@MaxSem - I think the problem is that 'ICU itemizer' sets RTL direction to the whole run even there is a neutral char '+' or '-' etc. I'm not sure if this is expected and we need look at harfbuzz shaping or? Your feedback would be super useful as I'm not an expert on RTL languages.

These are all correct. No matter what the bidi algorithm returns, if a char is not being displayed, that sounds like a problem in mapnik code. That's regardless of whether the bidi is correct or not.

@kroll-j
Copy link
Contributor

kroll-j commented Oct 16, 2015

Hmmm. I looked into this a bit and it seems to happen with all non-alphanumeric chars, not just minus. See here. Haven't figured out why this happens yet.

@nyurik
Copy link

nyurik commented Oct 21, 2015

It seems to fail whenever RTL character (aleph) is preceded by the direction-neutral character.

@nyurik
Copy link

nyurik commented Oct 21, 2015

After some debugging, I think I understand the problem, but not the solution. Script itemizer treats USCRIPT_COMMON characters (such as '-') as being equal to USCRIPT_HEBREW - scrptrun.cpp#L112 (see unicode constants). Thus an aleph 'א' followed by a '-' end up in the same substring, with their script value being set to USCRIPT_HEBREW. During the text shaping in harfbuzz_shaper.hpp#L120, the code point for '-' is missing from the hebrew font. But it tries to render the whole substring in one font, and failing to find a font that has every character of the substring, it draws '-' as a box.
I tried to change the sameScript() function to treat common as different script, thus causing it to break into parts, but that fails because common is the initial value of any substring, thus it causes an infinite loop. Suggestions and fixes are welcome. ))

@khaledhosny
Copy link

You don’t need to change the script itemisation, you need to fix the font fallback mechanism; If mapnik has one it is clearly failing here.

@nyurik
Copy link

nyurik commented Oct 22, 2015

If i understood it correctly, mapnik only supports the entire sub-string in the same font, because its not drawing it one letter at a time, but one sub-string at a time. That's why I think we would need to split the string further. BUT, I might be wrong here.

@khaledhosny
Copy link

Someone who knows about mapnik font fallback handling (or lack of) needs to chime in here, but one simple and effective way to handle font fallback is what is briefly described in this mailing list post: http://lists.freedesktop.org/archives/harfbuzz/2015-October/005168.html

kroll-j added a commit to wmde/mapnik that referenced this issue Oct 26, 2015
@kroll-j
Copy link
Contributor

kroll-j commented Oct 26, 2015

I think I finally found and fixed the problem. The Harfbuzz shaper goes through the font face set for each text item, looking for a single font face that can render all glyphs. In the case of a text item containing both hebrew and common scripts, there is no such face. It then falls through, rendering the text item with whatever font face happens to end up last in the list.
I've changed it slightly to collect faces for each glyph in a std::map, which are then used for rendering. The fork is here. This is the relevant commit.
This fixes the problem for me, the output looks correct now. I hope this is the right approach. Can somebody confirm?

@kroll-j
Copy link
Contributor

kroll-j commented Oct 26, 2015

(thinking about it, a simple array would probably have been the better choice for this, instead of a map...)

@talaj talaj mentioned this issue Oct 30, 2015
@nyurik
Copy link

nyurik commented Nov 18, 2015

Should this issue be closed, now that #3151 has been merged?

@artemp artemp modified the milestones: Mapnik 3.0.9, Mapnik 3.1.0 Nov 18, 2015
@artemp
Copy link
Member

artemp commented Nov 18, 2015

@nyurik - yes, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants