Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.6.380+ Regression: Glyphs’ migration to PUA can fail, get missed when rendered #8255

Closed
evadne opened this issue Apr 7, 2017 · 5 comments · Fixed by #8580
Closed

v1.6.380+ Regression: Glyphs’ migration to PUA can fail, get missed when rendered #8255

evadne opened this issue Apr 7, 2017 · 5 comments · Fixed by #8580

Comments

@evadne
Copy link
Contributor

evadne commented Apr 7, 2017

Link to PDF file (or attach file here): confidential but can send over secure channel reduced test case available, see comment

Configuration: Any

  • Web browser and its version: Any
  • Operating system and its version: Any
  • PDF.js version: Any version more recent than 1.6.380
  • Is an extension: No

Explanation:

Starting from build 1.6.380, presumably with commit 8d036fa, symbolic fonts are having their glyphs moved entirely to the Private Use Area in order to work around a canvas rendition problem.

I have seen this cause an issue with one of my documents where Hangul glyphs using Arial Unicode MS do not render properly in PDF.js on or after said release (1.6.380). I have traced the problem through the codebase and think I know what the problem is.

The problem is that there are three Private Use Areas…

  1. Basic Multilingual Plane, U+E000 - U+F8FF, 6400 glyphs
  2. Plane 15, U+F0000 - U+FFFFD, 65534 glyphs
  3. Plane 16, U+100000 - U+10FFFD, 65534 glyphs

…and PDF.js only moves glyphs into the first area which supposedly gives it capacity for 6400 glyphs, but I have a document that triggers a problem where certain glyphs do not render, because it is using a font marked as Symbolic and has more than that amount of glyphs, with some of them already in the U+E000 - U+F8FF range. In practice, this causes certain documents to render without all glyphs being seen and is reproducible both in Firefox Developer Edition and a custom viewer I built by pulling components from pdfjs-dist.

I have currently worked around this issue in my own system by setting that font as not symbolic, and left the following comments for your reflection —

This is probably because that the Private Use Area in question, while explicitly left undefined by the Unicode Consortium, has already been filled by vendors like Apple (which uses the last glyph 0xF8FF as an Apple icon) and Microsoft (in various Core Fonts, even Arial Unicode MS, whose version 1.01 contains 43 characters in the Private Use Area).
Therefore at this time the most immediate solution is to work around this issue by ensuring that if Arial Unicode MS is marked as Symbolic by whatever application producing a PDF, it is unmarked as Symbolic here within $Service.

It would be nice if we can either patch PRIVATE_USE_OFFSET_START and PRIVATE_USE_OFFSET_END (in src/core/fonts.js) or make PDF.js use all three Private Use Areas and ensure that it does not touch any glyph already in these areas.

I may be able to contribute that patch given enough guidance.

@brendandahl
Copy link
Contributor

To use all three we'd need to modify the code that moves the glyphs to the private use area. It shouldn't be too hard to add support for all three. Alternatively, we could always use Plane 15. I seem to remember we had an issue using Plane 15 and 16, but I can remember what.

@brendandahl
Copy link
Contributor

Also, do you have a PDF you can share that triggers this issue?

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Apr 8, 2017

@evadne Thank you for the test-case!
However, since you remarked above that the PDF file contains sensitive information, are you aware that the file you attached still contains a lot more information than what's easily visible?
If one removes the /Cropbox entry from your PDF file, it's actually possible to recover the entire page since the information is still present in the file even though it's not visible by default.

If this is a problem, privacy wise, I'd suggest that you remove the link to the file!
And in case you want/need to remove your file, I'm attaching a reduced PDF file which contains only the text visible in your file and nothing more: issue8255_reduced.pdf.

@evadne
Copy link
Contributor Author

evadne commented Apr 8, 2017

@Snuffleupagus thanks for the trim 😃

@evadne
Copy link
Contributor Author

evadne commented Apr 10, 2017

I have a Pull Request up ☝️ but not sure if that change makes sense?

brendandahl added a commit to brendandahl/pdf.js that referenced this issue Jul 21, 2017
The initial issue with mozilla#8255 was I added a missing glyphs
check to adjustMapping, but this caused us to skip re-mapping
a glyph if the fontCharCode was a missingGlyph which in turn
caused us to overwrite a valid glyph id with an invalid one. While
fixing this, I also added a warning if the private use area is full since
this also accidentally happened when I made a different mistake.

This brought to light a number of issues where we map
missing glyphs to notdef, but often the notdef is actually defined
and then ends up being drawn. Now the glyphs don't get
mapped in toFontChar and so they are not drawn by the canvas.

Fixing the above brought up another issue though in bug1050040.pdf.
In this PDF, the font fails to load by the browser and before we were still
drawing the glyphs because it looked like the font had them, but with the fixes
above the glyphs showed up as missing so we didn't attempt draw them. To
fix this, I now throw an error when the loca table is in really bad shape and
we fall back to trying to use a system font. We now also use this fall back if
there are any format errors during converting fonts.
brendandahl added a commit to brendandahl/pdf.js that referenced this issue Jul 26, 2017
The initial issue with mozilla#8255 was I added a missing glyphs
check to adjustMapping, but this caused us to skip re-mapping
a glyph if the fontCharCode was a missingGlyph which in turn
caused us to overwrite a valid glyph id with an invalid one. While
fixing this, I also added a warning if the private use area is full since
this also accidentally happened when I made a different mistake.

This brought to light a number of issues where we map
missing glyphs to notdef, but often the notdef is actually defined
and then ends up being drawn. Now the glyphs don't get
mapped in toFontChar and so they are not drawn by the canvas.

Fixing the above brought up another issue though in bug1050040.pdf.
In this PDF, the font fails to load by the browser and before we were still
drawing the glyphs because it looked like the font had them, but with the fixes
above the glyphs showed up as missing so we didn't attempt draw them. To
fix this, I now throw an error when the loca table is in really bad shape and
we fall back to trying to use a system font. We now also use this fall back if
there are any format errors during converting fonts.
movsb pushed a commit to movsb/pdf.js that referenced this issue Jul 14, 2018
The initial issue with mozilla#8255 was I added a missing glyphs
check to adjustMapping, but this caused us to skip re-mapping
a glyph if the fontCharCode was a missingGlyph which in turn
caused us to overwrite a valid glyph id with an invalid one. While
fixing this, I also added a warning if the private use area is full since
this also accidentally happened when I made a different mistake.

This brought to light a number of issues where we map
missing glyphs to notdef, but often the notdef is actually defined
and then ends up being drawn. Now the glyphs don't get
mapped in toFontChar and so they are not drawn by the canvas.

Fixing the above brought up another issue though in bug1050040.pdf.
In this PDF, the font fails to load by the browser and before we were still
drawing the glyphs because it looked like the font had them, but with the fixes
above the glyphs showed up as missing so we didn't attempt draw them. To
fix this, I now throw an error when the loca table is in really bad shape and
we fall back to trying to use a system font. We now also use this fall back if
there are any format errors during converting fonts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment