Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdf_text cutting off portion of page printed in landscape #7

Closed
jimmyg3g opened this issue Mar 3, 2016 · 13 comments
Closed

pdf_text cutting off portion of page printed in landscape #7

jimmyg3g opened this issue Mar 3, 2016 · 13 comments

Comments

@jimmyg3g
Copy link

jimmyg3g commented Mar 3, 2016

pdf_text is not scanning in the part of the page that is past 8.5" wide. I created an Excel and a Word doc and saved them as landscape and it is scanning entire page into R. So, maybe it is something specific to my pdf.

@jeroen
Copy link
Member

jeroen commented Mar 3, 2016

Can you include a reproducible example please?

@jimmyg3g
Copy link
Author

jimmyg3g commented Mar 4, 2016

Unfortunately, they all have confidential info so I can't share it. The last page of each PDF is blank and in portrait while the rest of the PDF is in landscape. Could that cause the issue?

I'm out of the office until Monday, when I'm back in I'll see if I can get a PDF that's suitable for sharing.

@jimmyg3g
Copy link
Author

jimmyg3g commented Mar 7, 2016

Here's a PDF where the two columns on the far right are not being pulled into R via pdf_text.
waurika_news_democrat.pdf

@jeroen
Copy link
Member

jeroen commented Mar 8, 2016

@jeroen
Copy link
Member

jeroen commented Mar 17, 2016

@lpatruno
Copy link

Any word on this issue?

@jeroen
Copy link
Member

jeroen commented Apr 13, 2016

I don't think so, haven't heard back. You are free to subscribe to the libpoppler mailing list and post a reminder for: https://lists.freedesktop.org/archives/poppler/2016-March/011755.html

@jimmyg3g
Copy link
Author

FWIW, I have succesfully converted that PDF using Xpdf's pdftotext, which can be found here: http://www.foolabs.com/xpdf/download.html

jeroen added a commit that referenced this issue Dec 3, 2016
@jeroen
Copy link
Member

jeroen commented Dec 3, 2016

Looks like the poppler folks are not in a hurry to fix this, so I added a workaround that double the width of the target rectangle for landscape pages.

It's not perfect but I think this will avoid the problem in most cases.

@MichaelChirico
Copy link

Just encountered this again:

voting_equipment_by_municipality_2_pdf_15114.pdf

The whole ACCESSIBLE EQUIPMENT column is clipped on all pages.

@jeroen
Copy link
Member

jeroen commented Dec 4, 2016

This should be fixed in version 1.0. Can you try updating to the latest version?

@MichaelChirico
Copy link

Indeed, fixed on my example!

@jimmyg3g
Copy link
Author

jimmyg3g commented Dec 5, 2016

That fixed my issue too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants