Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Umlauts in filename problem and PyPDF2 hiccups #18

Open
arminbw opened this issue Jul 7, 2019 · 1 comment
Open

Umlauts in filename problem and PyPDF2 hiccups #18

arminbw opened this issue Jul 7, 2019 · 1 comment

Comments

@arminbw
Copy link

arminbw commented Jul 7, 2019

After I decrypted my database I used menextract2pdf to get my annotations into the pdfs. I encountered a couple of errors:

Could not find pdffile /Users/armin/Desktop/ProjekteOnHold/ceat/mendeley_archive/Mach - 1886 - Beiträge zur Analyse der Empfindungen.pdf

This is an Umlaut encoding issue. Adding .decode("utf8") on line 28 solved this problem for me.

zlib.error: Error -3 while decompressing data: incorrect header check
and
ValueError: invalid literal for int() with base 10: 'dobj'

These were errors related to specific (kind of corrupted) pdfs. I added print(fn) to processpdf(fn, fn_out, annotations) so I could identify and manually remove the culprits.

Thank you for writing Menextract2pdf!

@folofjc
Copy link

folofjc commented Oct 8, 2019

I had to add print(fn.encode("utf-8")) since it was even failing on the print command.

Then I had to go change the title of the article in Mendeley and close Mendeley. Then delete the file from the Downloaded folder. Then start Mendeley again and sync. It would then download the file and rename it without the offending characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants