Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for ISO 32000-2 AES256 encryption #614

Merged

Conversation

Darkheir
Copy link

@Darkheir Darkheir commented May 5, 2021

Pull request

ISO 32000-2 is bringing a new password derivation algorithm that must be applied before using the key to decrypt the document.

This algorithm is needed for documents using the Version 5, Revision 6. PDF 2.0 is also deprecating Version 5 Revision 5 because of security issues.

This PR aims to support this new algorithm.

How Has This Been Tested?

Added tests

Checklist

  • I have added tests that prove my fix is effective or that my feature
    works
  • I have added docstrings to newly created methods and classes
  • I have optimized the code at least one time after creating the initial
    version
  • I have updated the README.md or I am verified that this
    is not necessary
  • I have updated the readthedocs documentation or I
    verified that this is not necessary
  • I have added a consice human-readable description of the change to
    CHANGELOG.md

@Darkheir Darkheir force-pushed the feat/support_iso_32000-2_encryption branch from d5c77da to 5941288 Compare May 5, 2021 16:08
@Darkheir Darkheir force-pushed the feat/support_iso_32000-2_encryption branch from 5941288 to 7e468ec Compare May 5, 2021 16:12
@pietermarsman
Copy link
Member

pietermarsman commented Aug 31, 2021

Hi @Darkheir,

Thanks for this work.

If I understand correctly this adds functionality that is added with the PDF 2.0 specification.

To do a sensible review I want to look up the PDF 2.0 specification but it seems that you have to pay for it. Do you know if that is true? And where did you get your copy of the 2.0 specification?

@Darkheir
Copy link
Author

Darkheir commented Sep 1, 2021

Hi @pietermarsman

Sadly I didn't get access to the specification since you have to pay for it as you said.

I found a python project that implemented the feature: https://github.com/MatthiasValvekens/pyHanko/blob/master/pyhanko/pdf_utils/crypt.py

The code is based on this implementation, I only changed it to fit with the pdfminer code base.

I also added an encrypted pdf to test the implemented feature.

@MatthiasValvekens
Copy link

Hi, I'm the author of the linked implementation on which this one was based. I happen to own a (legal) copy of ISO 32000-2:2020 that I purchased from ISO through the usual channels. Unfortunately, since ISO licenses are single-user, I'm not at liberty to share the document. I also can't vouch for the correctness of this adaptation.

However, even if you don't have access to the official standard, that doesn't mean that you can't do any interoperability testing at all. There are quite a few implementations of PDF 2.0 encryption out there (both open-source and not) that you could use to experiment. For example, iText has had support for this stuff pretty much since the very first revision of ISO 32000-2 came out: https://github.com/itext/itext7/blob/develop/kernel/src/main/java/com/itextpdf/kernel/crypto/securityhandler/StandardHandlerUsingAes256.java.


Also: my implementation is MIT licensed, so you're of course free to modify & integrate it into this project, but would you mind adding a notice to make sure the attribution clause in the license is adhered to? It's no big deal, but I'd still appreciate the courtesy. Thanks ;)

(I also have a copy of the license for PyPDF2 in my repo, since a lot of the low-level PDF code heavily borrows from it.)

Copy link
Member

@pietermarsman pietermarsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to assume that this works, because the tests work.

pdfminer/pdfdocument.py Outdated Show resolved Hide resolved
Copy link
Member

@pietermarsman pietermarsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this MR needs 3 more changes I think:

  • Use None instead of b''
  • Fix CHANGELOG.md conflict
  • Include reference to original implementation as @MatthiasValvekens suggested

Copy link

@MatthiasValvekens MatthiasValvekens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added a couple of comments.

Another general one: correct me if I'm wrong, but I don't think you're validating the integrity of the Perms entry anywhere in the authentication process. Was that intentional? It's part of the key retrieval/password validation procedure for revision 6 security handlers. Right now, it doesn't provide much in the way of integrity protection, but If nothing else, it's a good sanity check.

pdfminer/pdfdocument.py Outdated Show resolved Hide resolved
pdfminer/pdfdocument.py Show resolved Hide resolved
@Darkheir Darkheir force-pushed the feat/support_iso_32000-2_encryption branch from 6833e79 to 30530f1 Compare September 6, 2021 14:33
@pietermarsman pietermarsman merged commit c3e3499 into pdfminer:develop Sep 6, 2021
@pietermarsman
Copy link
Member

@Darkheir @MatthiasValvekens Thanks for the work and support on this!

@MartinThoma
Copy link

@Darkheir would you be interested in adding modern encryption support to PyPDF2 as well?

https://github.com/py-pdf/PyPDF2

We recently added decryption support. Only the encryption is missing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants