Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add AES support for encrypting PDF files (merged via other PRs) #1816

Closed
wants to merge 34 commits into from

Conversation

exiledkingcc
Copy link
Contributor

@exiledkingcc exiledkingcc commented Apr 27, 2023

Closes #1754

@codecov
Copy link

codecov bot commented Apr 28, 2023

Codecov Report

Patch coverage: 92.45% and project coverage change: -0.01 ⚠️

Comparison is base (2b47d9a) 93.89% compared to head (712acb2) 93.89%.

❗ Current head 712acb2 differs from pull request most recent head a7cfdb4. Consider uploading reports for the commit a7cfdb4 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1816      +/-   ##
==========================================
- Coverage   93.89%   93.89%   -0.01%     
==========================================
  Files          33       33              
  Lines        6983     6946      -37     
  Branches     1382     1374       -8     
==========================================
- Hits         6557     6522      -35     
  Misses        274      274              
+ Partials      152      150       -2     
Impacted Files Coverage Δ
pypdf/xmp.py 98.67% <50.00%> (ø)
pypdf/generic/_data_structures.py 91.09% <69.23%> (ø)
pypdf/_encryption.py 94.95% <92.55%> (ø)
pypdf/_protocols.py 100.00% <100.00%> (ø)
pypdf/_writer.py 88.34% <100.00%> (-0.09%) ⬇️
pypdf/generic/_base.py 100.00% <100.00%> (ø)
pypdf/generic/_outline.py 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

pypdf/_writer.py Outdated
Comment on lines 1039 to 1160
user_pwd: Optional[str] = None, # deprecated
owner_pwd: Optional[str] = None, # deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two need to stay as long as we don't make a major version bump. As still many users are migrating from PyPDF2 (sometimes even PyPDF2 < 2.0.0), I want to keep the warnings for quite a while.

pypdf/_writer.py Outdated
Comment on lines 1062 to 1221
warnings.warn(
"pypdf only implements RC4 encryption so far. "
"The RC4 algorithm is insecure. Either use a library that supports "
"AES for encryption or put the PDF in an encrypted container, "
"for example an encrypted ZIP file."
)
if user_pwd is not None:
if user_password is not None:
raise ValueError(
"Please only set 'user_password'. "
"The 'user_pwd' argument is deprecated."
)
else:
warnings.warn(
"Please use 'user_password' instead of 'user_pwd'. "
"The 'user_pwd' argument is deprecated and "
"will be removed in pypdf 4.0.0."
)
user_password = user_pwd
if user_password is None: # deprecated
# user_password is only Optional for due to the deprecated user_pwd
raise ValueError("user_password may not be None")

if owner_pwd is not None: # deprecated
if owner_password is not None:
raise ValueError(
"The argument owner_pwd of encrypt is deprecated. "
"Use owner_password only."
)
else:
old_term = "owner_pwd"
new_term = "owner_password"
warnings.warn(
message=(
f"{old_term} is deprecated as an argument and will be "
f"removed in pypdf 4.0.0. Use {new_term} instead"
),
category=DeprecationWarning,
)
owner_password = owner_pwd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, those need to stay for the same reason: Keeping helpful deprection messages for users

@MartinThoma MartinThoma changed the title add AES support for encrypting PDF files ENH: Add AES support for encrypting PDF files Apr 29, 2023
@MartinThoma MartinThoma added nf-security Non-functional change: Security workflow-encryption From a users perspective, encryption is the affected feature/workflow labels Apr 29, 2023
@MartinThoma
Copy link
Member

@exiledkingcc Thank you so much for picking this up 🙏

I'm currently quite busy with my private live, but I try to review everything tomorrow :-)

pypdf/xmp.py Outdated
Comment on lines 239 to 248
def writeToStream(
self, stream: StreamType, encryption_key: Union[None, str, bytes]
) -> None: # deprecated
"""
Use :meth:`write_to_stream` instead.

.. deprecated:: 1.28.0
"""
deprecation_with_replacement("writeToStream", "write_to_stream", "3.0.0")
self.write_to_stream(stream, encryption_key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to stay to help migrating users

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MartinThoma
this depreciation is quite old and it also applies to internal objects. I think we could try to delete it. It will be possible to restore it later.
Also write_to_stream() arguments have changed and for the same reasons I think the modification can be kept

Copy link
Member

@MartinThoma MartinThoma May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to follow our deprecation process:

  1. Deprecation warning was added in 1.28.0 via DEP: Class, variable, and module names #867 in May 2022
  2. Deprecation error was added in 3.0.0 <--- it should have been removed in 2.0.0; keep also in mind that we migrated PyPDF2 to pypdf. To the pypdf users it's just fair to have a little bit more time.
  3. Removal: Originally it should be removed in 3.0.0, but now we're going to remove it in 4.0.0

It has "only" been one year since the warning was added. I want to make a pypdf==4.0.0 release with a lot of deprecated code removals in 2024 (probably early January). Until then, I want to keep those methods to allow people to have an easier time to transition.

Is there a reason to have it in this PR?

To me it seems as if this PR would become a lot smaller and thus easier to review without those changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @pubpub-zz said, write_to_stream() arguments have changed, so writeToStream() should change, or be deleted. and it's not used in current code, and users do not use it, it's safe to delete it.

MartinThoma added a commit that referenced this pull request Apr 30, 2023
This makes PR #1816 smaller

Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
pypdf/_writer.py Outdated Show resolved Hide resolved
MartinThoma added a commit that referenced this pull request Apr 30, 2023
This makes PR #1816 smaller

Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
@MartinThoma
Copy link
Member

@exiledkingcc Just a short heads-up: I have not forgotten this PR and I'm excited to get it into pypdf!

However, I also want to review it thoroughly. Recently I had a lot going on in my private live (being sick / buying my own house) which meant that I didn't have the time for it so far. I will do it eventually :-) (I promise to do it this year, I hope to do it in July)

@exiledkingcc
Copy link
Contributor Author

@MartinThoma it's ok, just take your time. and take care. 😊

@MartinThoma MartinThoma added the breaking-change A planned breaking change label Jun 24, 2023
@MartinThoma
Copy link
Member

I've just added the "breaking change" label so that I don't forget that the PR currently contains breaking changes. I want to undo them for the moment, but I also want to collect those things so that I can do several breaking changes next year.

The idea is to keep pypdf stable for a while to give the community time to move from PyPDF2 to pypdf.

@MartinThoma MartinThoma added the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Jun 24, 2023
MartinThoma added a commit that referenced this pull request Jun 25, 2023
Full credit goes to exiledkingcc

This PR was only made to make it easier to merge the other changes /
to avoid merge conflicts.

Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
MartinThoma added a commit that referenced this pull request Jun 25, 2023
* `CryptFilter.encrypt_object` implemented
* `AlgV5.generate_values` now crops the user_password / owner_password to 127 bytes
* The `EncryptAlgorithm` Enum was added. It contains the parameter V (version), R (revision), and length
* `Encryption.encrypt_object` was added
* `Encryption.write_entry` was added
* The static method `Encryption.make` was added

This PR was only made to make it easier to merge the other changes / to avoid merge conflicts of other changes with #1816 . Full credit goes to exiledkingcc.

The PR is marked as "MAINT" as it doesn't add a new feature that an end-user could use.

Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
MartinThoma added a commit that referenced this pull request Jun 25, 2023
* PdfWriter.encrypt: Add 'algorithm' parameter
* PdfWriter: Add _encryption property
* PdfWriter: Add _encrypt_entry property

This change was made in another PR to avoid merge conflicts /
get it merged soon. Full credit for the work goes to exiledkingcc
who did all of the work in #1816

Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
MartinThoma added a commit that referenced this pull request Jun 25, 2023
* PdfWriter.encrypt: Add 'algorithm' parameter
* PdfWriter: Add _encryption property
* PdfWriter: Add _encrypt_entry property

This change was made in another PR to avoid merge conflicts /
get it merged soon. Full credit for the work goes to exiledkingcc
who did all of the work in #1816

Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
MartinThoma added a commit that referenced this pull request Jul 2, 2023
* PdfWriter.encrypt: Add 'algorithm' parameter
* PdfWriter: Add _encryption property
* PdfWriter: Add _encrypt_entry property

This change was made in #1816 to avoid merge conflicts /
get it merged soon.

Full credit for the work goes to exiledkingcc
who did all of the work in #1816

Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
MartinThoma added a commit that referenced this pull request Jul 2, 2023
This PR was made to avoid breaking changes the original PR introduced.
We want to remove the outdated parameters, but not at the moment.
See https://pypdf.readthedocs.io/en/latest/dev/deprecations.html

Full credit for the work goes to exiledkingcc via PR #1816

Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
MartinThoma added a commit that referenced this pull request Jul 2, 2023
This PR was made to avoid breaking changes the original PR introduced.
We want to remove the outdated parameters, but not at the moment.
See https://pypdf.readthedocs.io/en/latest/dev/deprecations.html

Full credit for the work goes to exiledkingcc via PR #1816

Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
MartinThoma added a commit that referenced this pull request Jul 2, 2023
Partially taken from #1816

Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
MartinThoma added a commit that referenced this pull request Jul 2, 2023
* MAINT: Deprecate the encryption_key parameter of write_to_stream

Partially taken from #1816

Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
@MartinThoma
Copy link
Member

@exiledkingcc I finally managed to merge all of the changes 🎉

It was a bit difficult because there were a lot of changes and I needed to get an overview. Hence I made several small PRs. I also didn't want to introduce breaking changes now.

I gave you full credit via the co-authored-by feature + every commit message. I also made a shout-out on Twitter + in the release notes to ensure that you get all the credit for the awesome work you did 🙏 Thank you 🤗

@MartinThoma MartinThoma closed this Jul 2, 2023
@MartinThoma MartinThoma changed the title ENH: Add AES support for encrypting PDF files ENH: Add AES support for encrypting PDF files (merged via other PRs) Jul 2, 2023
@exiledkingcc exiledkingcc deleted the encrypt branch July 3, 2023 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change A planned breaking change nf-security Non-functional change: Security soon PRs that are almost ready to be merged, issues that get solved pretty soon workflow-encryption From a users perspective, encryption is the affected feature/workflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SEC: Switch to AES for encryption
3 participants