Umlauts in filename problem and PyPDF2 hiccups #18

arminbw · 2019-07-07T14:18:27Z

After I decrypted my database I used menextract2pdf to get my annotations into the pdfs. I encountered a couple of errors:

Could not find pdffile /Users/armin/Desktop/ProjekteOnHold/ceat/mendeley_archive/Mach - 1886 - BeitrÃ¤ge zur Analyse der Empfindungen.pdf

This is an Umlaut encoding issue. Adding .decode("utf8") on line 28 solved this problem for me.

zlib.error: Error -3 while decompressing data: incorrect header check
and
ValueError: invalid literal for int() with base 10: 'dobj'

These were errors related to specific (kind of corrupted) pdfs. I added print(fn) to processpdf(fn, fn_out, annotations) so I could identify and manually remove the culprits.

Thank you for writing Menextract2pdf!

The text was updated successfully, but these errors were encountered:

folofjc · 2019-10-08T15:04:53Z

I had to add print(fn.encode("utf-8")) since it was even failing on the print command.

Then I had to go change the title of the article in Mendeley and close Mendeley. Then delete the file from the Downloaded folder. Then start Mendeley again and sync. It would then download the file and rename it without the offending characters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Umlauts in filename problem and PyPDF2 hiccups #18

Umlauts in filename problem and PyPDF2 hiccups #18

arminbw commented Jul 7, 2019

folofjc commented Oct 8, 2019

Umlauts in filename problem and PyPDF2 hiccups #18

Umlauts in filename problem and PyPDF2 hiccups #18

Comments

arminbw commented Jul 7, 2019

folofjc commented Oct 8, 2019