Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pypdf: It's the newer PyPDF2 / PyPDF3 / PyPDF4 #48

Closed
MartinThoma opened this issue Jul 25, 2023 · 8 comments
Closed

pypdf: It's the newer PyPDF2 / PyPDF3 / PyPDF4 #48

MartinThoma opened this issue Jul 25, 2023 · 8 comments

Comments

@MartinThoma
Copy link

Hey, I'm the current maintainer of pypdf and PyPDF2 👋

I just wanted to let you know what pypdf is maintained again and received tons of updates since April 2022 (when I became the maintainer). In contrast, PyPDF3 and PyPDF4 are not maintained. For PyPDF2, I decided to migrate all the changes back into pypdf (now all-lowercase) and let PyPDF2 die (see the README of PyPDF2).

@MartinThoma
Copy link
Author

If I interpret this correctly:

    from pypdf import PdfFileWriter,PdfFileReader
    from pypdf.generic import *
    from pypdf.utils import isString,formatWarning,PdfReadError,readUntilWhitespace
    import pypdf.utils as utils
    legacy = False

Then you are already using pypdf. However, please be aware of https://pypdf2.readthedocs.io/en/3.0.0/user/migration-1-to-2.html . Especially that PdfFileReader got renamed to PdfReader

@MartinThoma
Copy link
Author

I came to your project due to py-pdf/pypdf#1994 . What do you think about that change? Would it break stuff for pypdfplot?

@dcmvdbekerom
Copy link
Owner

Hi @MartinThoma !

Thanks for reaching out and keeping me up to date on developments!
Yeah I remember at some point PyPDF4 had a different interface between the PyPi and github versions.
Glad to hear it is now actively maintained again.

Please go ahead and push the changes you need for pypdf, I'll fix my code to stay up to date with your code.

@MartinThoma
Copy link
Author

The PR is merged :-) I'll make a release on sunday :-)

Please note that PyPDF4 and pypdf are different projects. I put a lot of effort into pypdf in the last ~15 months. We merged tons of PRs and made a couple of backwards-incompatible changes to the interface (mostly naming conventions, replacing snakeCase with camel_case and using properties instead of getter/setter functions). If you upgrade, I recommend setting pypdf > 3.0.0 as the dependency.

We follow semantic versioning and try to only make breaking changes with major version updates. The change in the ASCIIHexDecode.decode signature is a bit of an outlier. The other functions suggest that it should have been another signature in the first place. The fact that pypdfplot was the only project I could find that uses it indicates that this breaking change might not affect many users + I was encouraged by the fact that you worked around this bug/inconsistency of pypdf already.

@MartinThoma
Copy link
Author

I'm closing this now as I only intended to inform you about the current developments at pypdf :-)

@dcmvdbekerom
Copy link
Owner

dcmvdbekerom commented Jul 27, 2023

Hi @MartinThoma, I didn't have so much time to reply last time so to elaborate a bit more:

Thank you so much for picking up development of pypdf again!
It was quite annoying being dependent on PyPDF4 which wasn't being actively maintained.
The str>byte decoder is one of a couple issues with PyPDF4 I monkeypatched.
It looks like this is now solved, but there were a few other issues like this.
I will look them up and suggest PR's for pypdf.

From the top of my head the two main ones are:

  • HexAsciiEncoder is missing in pypdf, would be good to add this
  • The data-addresses in the XREF table should be referenced to the start of the %PDF string, not to the start of the document. This is normally not an issue because %PDF is usually at the start of the document, but according to the PDF spec it doesn't have to be.

I'll open issues (and PR's) for this somewhere in the coming week.

Best,
Dirk

@MartinThoma
Copy link
Author

HexAsciiEncoder is missing in pypdf

Do you mean that ASCIIHexDecode should get an encode(data: bytes) -> bytes static method?

@MartinThoma
Copy link
Author

The data-addresses in the XREF table should be referenced to the start of the %PDF string, not to the start of the document. This is normally not an issue because %PDF is usually at the start of the document, but according to the PDF spec it doesn't have to be.

Thats an interesting point! If you could open an issue for that + give a few more details, I'm pretty sure it will be well received :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants