Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support metadata element names containing spaces #612

Merged
merged 4 commits into from
Jul 11, 2023

Conversation

GreyWyvern
Copy link
Contributor

@GreyWyvern GreyWyvern commented Jul 6, 2023

Add ability for PdfParser to parse metadata names with hexadecimal encoded characters such as "Document#20Type" where #20 is a space.

See PDF Reference 1.7 Section H.3 Implementation Notes, Subsection 3.2.4,3 (page 1099)
https://ia801001.us.archive.org/1/items/pdf1.7/pdf_reference_1-7.pdf

Fixes #529

GreyWyvern and others added 4 commits July 6, 2023 12:11
Add ability for PdfParser to parse metadata names with hexadecimal encoded characters such as "Document#20Type" where \smalot#20 is a space.
Resolves Issue smalot#529
Add test for spaces in metadata property names.
Too quick on the commit! Make sure our two 'digit' regexp also finds A-F hex digits. Add a test for #2d which is a hyphen.
@k00ni k00ni self-assigned this Jul 10, 2023
Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I will merge it soon.

@k00ni k00ni merged commit c42fc11 into smalot:master Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't get title and subject from metadata
2 participants