-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find a work-around for scanned pdf #2
Comments
@mvicenzi, can you provide some pdf samples and a test code for analysis |
@pubpub-zz you can find the simplest code to reproduce the error here. The sample pdf file is this one, which was generated with a scanner. |
I've tried with the dev version without any error.can you retry with the latest version of PyPDF2? |
I upgraded PyPDF2 from 1.27.3 to 1.27.12, but I still see errors.
I also tried cloning again the repo to check if uploading the pdf had fixed the file somehow, but no.... Still the same error as above. |
I've understood : your PDF does not respect the rule where the xref table should start at 0, inducing a PdfReadError if strict is asserted. PyPDF2 moved to strict = False as default, but PdfFileMerger has been forgotten. I will push the fix however meanwhile you can initialize with strict = False:
|
tracked in mvicenzi/pdf_tools#2 as said in title
Scanned pdf documents are not read properly and operations fail.
Investigate this issue: is it a PyPDF2 limitation? is there a workaround?
Possible starting points:
The text was updated successfully, but these errors were encountered: