-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird characters and missing text in multiline metadata text #366
Comments
PDFParser used some code from https://github.com/tecnickcom/TCPDF, but I ported it and removed TCPDF as Composer dependency. The files I ported can be found here: https://github.com/smalot/pdfparser/tree/master/src/Smalot/PdfParser/RawData Last time I checked tecnickcom's TCPDF project, it had the following message in the README.md:
I don't know if |
All right, thanks for information. |
All right, I think I found the bug myself - special characters in PDF strings may be escaped (e. g. |
So one has to take care of himself or do you suggest a change in the PDFParser? |
I strongly believe that this should be handled by the library - end user shouldn't be required to read PDF specification and implement his own escaping function just to extract metadata as simple text. |
Yes, from a quick test, it seems to work correctly with my first test pdf. |
Hello.
I have an issue with the way this library extracts metadata from a PDF file. It comes out corrupted, and I'm not sure why.
It looks to me that for some reason
\
are prefixed before newlines, but I may be wrong.Code used:
I attach the pdf's tested with metadata descriptions used, inserted with Adobe Acrobat.
pdf_1.pdf
pdf_2.pdf
pdf_3.pdf
subject_in_1.txt
subject_in_2.txt
subject_in_3.txt
subject_out_1.txt
subject_out_2.txt
subject_out_3.txt
Also I actually came here because of another library https://github.com/pauln/tcpdi. However, that one didn't seem very active and both of these libraries seemed to have problems with metadata. So there comes my other question - what is the relationship between
tcpdi_parser
andtcpdf_parser
couldtcpdi_parser
be replaced bytcpdf_parser
, this way having more update version?The text was updated successfully, but these errors were encountered: