pypdf: It's the newer PyPDF2 / PyPDF3 / PyPDF4 #48

MartinThoma · 2023-07-25T16:49:00Z

Hey, I'm the current maintainer of pypdf and PyPDF2 👋

I just wanted to let you know what pypdf is maintained again and received tons of updates since April 2022 (when I became the maintainer). In contrast, PyPDF3 and PyPDF4 are not maintained. For PyPDF2, I decided to migrate all the changes back into pypdf (now all-lowercase) and let PyPDF2 die (see the README of PyPDF2).

The text was updated successfully, but these errors were encountered:

MartinThoma · 2023-07-25T16:51:15Z

If I interpret this correctly:

    from pypdf import PdfFileWriter,PdfFileReader
    from pypdf.generic import *
    from pypdf.utils import isString,formatWarning,PdfReadError,readUntilWhitespace
    import pypdf.utils as utils
    legacy = False

Then you are already using pypdf. However, please be aware of https://pypdf2.readthedocs.io/en/3.0.0/user/migration-1-to-2.html . Especially that PdfFileReader got renamed to PdfReader

MartinThoma · 2023-07-25T17:13:42Z

I came to your project due to py-pdf/pypdf#1994 . What do you think about that change? Would it break stuff for pypdfplot?

dcmvdbekerom · 2023-07-25T19:11:37Z

Hi @MartinThoma !

Thanks for reaching out and keeping me up to date on developments!
Yeah I remember at some point PyPDF4 had a different interface between the PyPi and github versions.
Glad to hear it is now actively maintained again.

Please go ahead and push the changes you need for pypdf, I'll fix my code to stay up to date with your code.

MartinThoma · 2023-07-25T20:16:50Z

The PR is merged :-) I'll make a release on sunday :-)

Please note that PyPDF4 and pypdf are different projects. I put a lot of effort into pypdf in the last ~15 months. We merged tons of PRs and made a couple of backwards-incompatible changes to the interface (mostly naming conventions, replacing snakeCase with camel_case and using properties instead of getter/setter functions). If you upgrade, I recommend setting pypdf > 3.0.0 as the dependency.

We follow semantic versioning and try to only make breaking changes with major version updates. The change in the ASCIIHexDecode.decode signature is a bit of an outlier. The other functions suggest that it should have been another signature in the first place. The fact that pypdfplot was the only project I could find that uses it indicates that this breaking change might not affect many users + I was encouraged by the fact that you worked around this bug/inconsistency of pypdf already.

MartinThoma · 2023-07-25T20:17:17Z

I'm closing this now as I only intended to inform you about the current developments at pypdf :-)

dcmvdbekerom · 2023-07-27T16:08:17Z

Hi @MartinThoma, I didn't have so much time to reply last time so to elaborate a bit more:

Thank you so much for picking up development of pypdf again!
It was quite annoying being dependent on PyPDF4 which wasn't being actively maintained.
The str>byte decoder is one of a couple issues with PyPDF4 I monkeypatched.
It looks like this is now solved, but there were a few other issues like this.
I will look them up and suggest PR's for pypdf.

From the top of my head the two main ones are:

HexAsciiEncoder is missing in pypdf, would be good to add this
The data-addresses in the XREF table should be referenced to the start of the %PDF string, not to the start of the document. This is normally not an issue because %PDF is usually at the start of the document, but according to the PDF spec it doesn't have to be.

I'll open issues (and PR's) for this somewhere in the coming week.

Best,
Dirk

MartinThoma · 2023-07-27T16:41:26Z

HexAsciiEncoder is missing in pypdf

Do you mean that ASCIIHexDecode should get an encode(data: bytes) -> bytes static method?

MartinThoma · 2023-07-27T16:42:29Z

The data-addresses in the XREF table should be referenced to the start of the %PDF string, not to the start of the document. This is normally not an issue because %PDF is usually at the start of the document, but according to the PDF spec it doesn't have to be.

Thats an interesting point! If you could open an issue for that + give a few more details, I'm pretty sure it will be well received :-)

MartinThoma mentioned this issue Jul 25, 2023

BUG: ASCIIHexDecode.decode now returns bytes instead of str py-pdf/pypdf#1994

Merged

MartinThoma closed this as completed Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pypdf: It's the newer PyPDF2 / PyPDF3 / PyPDF4 #48

pypdf: It's the newer PyPDF2 / PyPDF3 / PyPDF4 #48

MartinThoma commented Jul 25, 2023

MartinThoma commented Jul 25, 2023

MartinThoma commented Jul 25, 2023

dcmvdbekerom commented Jul 25, 2023

MartinThoma commented Jul 25, 2023

MartinThoma commented Jul 25, 2023

dcmvdbekerom commented Jul 27, 2023 •

edited

Loading

MartinThoma commented Jul 27, 2023

MartinThoma commented Jul 27, 2023

pypdf: It's the newer PyPDF2 / PyPDF3 / PyPDF4 #48

pypdf: It's the newer PyPDF2 / PyPDF3 / PyPDF4 #48

Comments

MartinThoma commented Jul 25, 2023

MartinThoma commented Jul 25, 2023

MartinThoma commented Jul 25, 2023

dcmvdbekerom commented Jul 25, 2023

MartinThoma commented Jul 25, 2023

MartinThoma commented Jul 25, 2023

dcmvdbekerom commented Jul 27, 2023 • edited Loading

MartinThoma commented Jul 27, 2023

MartinThoma commented Jul 27, 2023

dcmvdbekerom commented Jul 27, 2023 •

edited

Loading