Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text layer may contain overlapping areas (react-pdf 9.0.0) #1828

Open
4 tasks done
obecker opened this issue Jun 11, 2024 · 3 comments
Open
4 tasks done

Text layer may contain overlapping areas (react-pdf 9.0.0) #1828

obecker opened this issue Jun 11, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@obecker
Copy link

obecker commented Jun 11, 2024

Before you start - checklist

  • I followed instructions in documentation written for my React-PDF version
  • I have checked if this bug is not already reported
  • I have checked if an issue is not listed in Known issues
  • If I have a problem with PDF rendering, I checked if my PDF renders properly in PDF.js demo

Description

After upgrading react-pdf from 8.0.2 to 9.0.0 I observed that consecutive spans in the same line within the text layer may overlap (i.e. the spans are too wide). This prevents the correct selection of text in the document.

This is an example from the provided sample.pdf (page 2, penultimate paragraph):

Bildschirmfoto 2024-06-11 um 12 57 07

You can see the overlapping area at the word "bibendum".

Now, while I supposed that this must be something in the core pdf.js library, I am unable to reproduce the behavior in the pdf.js demo. I even downloaded the latest (4.3.136) release from https://github.com/mozilla/pdf.js/releases, ran npx serve in the extracted folder, and opened web/viewer.html with the sample.pdf - the issue is not there.

If you want to test it with a different PDF, try https://www.vbg.de/cms/_Resources/Persistent/7/0/d/c/70dc78bec739e6cbe27bc8ba77a16d15347461d7/M_Arzt_Anforderungen.pdf and here the last list item on the first page ("über Kenntnisse in der erforderlichen Röntgentechnik und Röntgendiagnostik verfügen.")

Steps to reproduce

Run yarn run dev in sample/create-react-app-5, scroll to page 2 and select the first line of the penultimate paragraph.

Try to select and copy the word `justo'

Expected behavior

The selected areas don't overlap. The word 'justo' gets copied.

Actual behavior

They do overlap. The copied text is 'utat'

Additional information

No response

Environment

  • Browser (if applicable): Firefox 126.0.1, latest Chrome, Opera, MS Edge (all macOS)
  • React-PDF version: 9.0.0
  • React version: 18.2.0
  • Webpack version (if applicable):
@obecker obecker added the bug Something isn't working label Jun 11, 2024
@wojtekmaj
Copy link
Owner

Hmmmm, I can reproduce this:

image

Oddly enough, this doesn't happen for me in all cases. Our test suite is free from this issue (it seems), but samples are not.

@huuphat1908
Copy link

Same issue

@obecker
Copy link
Author

obecker commented Aug 11, 2024

Apparently this issue is still present in 9.1.0. I would like to help resolving this issue, however I have no idea where to start and what might causing this.

By the way, I also tried https://github.com/wojtekmaj/react-pdf/tree/main/test/test.pdf, and it gives me a completely different result, where some text areas are too small/short:

Bildschirmfoto 2024-08-11 um 19 39 04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants