Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF file containing hyeroglyphical characters: cmap Adobe-Japan1-UCS2 (8.001) does not work but 5.001 do. #7696

Closed
catsmile opened this issue Oct 7, 2016 · 6 comments · Fixed by #8580

Comments

@catsmile
Copy link

catsmile commented Oct 7, 2016

Link to PDF file (or attach file here): http://s000.tinyupload.com/index.php?file_id=00088937975359643663

Configuration:

  • Web browser and its version: Google Chrome 53.0.2785.143 m (the latest for now), Mozilla Firefox 49.0.1
  • Operating system and its version: Windows 7 SP1 x64
  • PDF.js version: 1.6.210
  • Is an extension: no

Not sure if this is a really PDF.js issue, probably it is related to the Adobe cmap files. The workaround of the issue is placed at the end if the issue description.

Steps to reproduce the problem:

  1. Open the attached file in the PDF.js viewer.
    2a. The viewer throw the following error in the JS console: "DOMException: Invalid font data inArrayBuffer."
    2b. The file displays incorrectly (characters are different in compare with the file opened in the Adobe Acrobat).

The PDF file opened in the Adobe Acrobat: https://snag.gy/0uApct.jpg
The PDF file opened in the PDF.js viewer (malformed): https://snag.gy/eVcl21.jpg
Developer console with error: https://snag.gy/ZoWXuA.jpg

IMPORTANT NOTES (WORKAROUND)

  1. The viewer tries to use the Adobe-Japan1-UCS2.bcmap cmap that is in the original PDF.js distribution.
  2. If I set the cMapPacked property to false and add the unpacked cmap file from the Adobe Acrobat distribution, the behavior is the same. The version of the cmap file is 8.001
  3. I've found the older cmap file version (5.001) and generated the appropriate packed cmap with the cmapscompress tool (https://github.com/mozilla/pdf.js/tree/master/external/cmapscompress). Everything started working fine for the PDF file: https://snag.gy/GwfJVE.jpg.

Plain text cmaps:

Packed cmaps:

@yurydelendik
Copy link
Contributor

yurydelendik commented Oct 7, 2016

I compared http://s000.tinyupload.com/index.php?file_id=17713344732879219876 with file in the PDF.js -- they looked exactly the same. Please provide complete example(s), e.g. open new repo at github that demonstrate issue.

@catsmile
Copy link
Author

catsmile commented Oct 7, 2016

Here you go: https://github.com/catsmile/pdfjs-cmap-issue
I've left just the Adobe-Japan1-UCS2 cmaps (5.001 and 8.001).
/web/viewer_correct.html will use the /cmaps/5.001/Adobe-Japan1-UCS2.bcmap
/web/viewer_malformed.html will use the /cmaps/8.001/Adobe-Japan1-UCS2.bcmap

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Oct 8, 2016

I'm not convinced that the CMap itself is the problem here, since the only noticeable difference between the working/non-working CMap seems to be that in the latter case it's quite a bit longer.

When looking (quickly) at the resulting glyph map created by PDF.js, it seems that a fair number of charCodes are mapped to the same glyphIds. I thus suspect that this issue might have the same cause, and possibly the same solution, as described in #6397 (comment).

@catsmile
Copy link
Author

Well, okay. As for my project I'll user the Adobe-Japan1-UCS2.bcmap of version 5.001 as a workaround.

@Snuffleupagus
Copy link
Collaborator

Closing as fixed by PR #8580.

@nguyenngochanh
Copy link

Hello,
My app is cordova, i received crash when read pdf searchable
CDVWKWebViewEngine: Error Domain=NSCocoaErrorDomain Code=260 "The file “nullAdobe-Japan1-UCS2.bcmap” couldn’t be opened because there is no such file.

Could you help me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants