Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfWriter as context manager while passing existing file not working in 5.2.0 #3087

Open
AlbertUnruh opened this issue Jan 27, 2025 · 3 comments
Labels
is-regression Regression introduced as a side-effect of another change workflow-merge From a users perspective, merging is the affected feature/workflow

Comments

@AlbertUnruh
Copy link

After upgrading pypdf from 5.1.0 to 5.2.0 my function to merge multiple pdfs didn't merge my files anymore.

I wasn't able to find anything in the change-log about any breaking changes.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Windows-10-10.0.19045-SP0

$ python -c "import pypdf;print(pypdf._debug_versions)"  # 5.1.0
pypdf==5.1.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=11.1.0

$ python -c "import pypdf;print(pypdf._debug_versions)"  # 5.2.0
pypdf==5.2.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=11.1.0

Code + PDF

This is a minimal, complete example that shows the issue:

def merge_pdfs(*pdfs: Path, destination: Path):
    """Merge multiple pdfs into one big pdf at the given destination."""
    with PdfWriter(destination) as merger:
        merger.add_metadata(PdfReader(pdfs[0]).metadata)
        for pdf in pdfs:
            merger.append(pdf)

# merge_pdfs() works in 5.1.0 just fine
# merge_pdfs() doesn't work in 5.2.0 anymore

I don't think that any specific files are required.
I tested it with multiple files and it didn't work.

@stefan6419846
Copy link
Collaborator

Thanks for the report. Unfortunately, I have not been able to reproduce this with a simple setup:

from pypdf import PdfReader, PdfWriter


pdfs = ['file1.pdf', 'file2.pdf']

with PdfWriter('out.pdf') as merger:
    merger.add_metadata(PdfReader(pdfs[0]).metadata)
    for pdf in pdfs:
        merger.append(pdf)

#2909 might be related.

Please provide some further details and the PDF files which show this behavior for you to further analyze this issue.

@stefan6419846 stefan6419846 added needs-pdf The issue needs a PDF file to show the problem workflow-merge From a users perspective, merging is the affected feature/workflow is-regression Regression introduced as a side-effect of another change labels Jan 27, 2025
@AlbertUnruh
Copy link
Author

I'm merging all pdfs into the first one.

For my setup I would make a call like merge_pdfs(pdf1, pdf2, destination=pdf1).

Is it possible that it has something to do with the order of closing and writing to pdf1?

I'm not home atm, otherwise I would check it myself.

@stefan6419846 stefan6419846 removed the needs-pdf The issue needs a PDF file to show the problem label Jan 28, 2025
@stefan6419846
Copy link
Collaborator

I have been able to reproduce this with the output file already existing - it becomes even more obvious once you set the output file to an existing non-PDF file. As mentioned before, this is related to #2909 and #2913, although only #2909 is relevant as part of the release.

The underlying issue is more or less bad legacy API design which requires guessing the intent of the user. Ideally, the parameters to PdfWriter would be keyword-only to completely eliminate the need of guessing, but this requires a proper deprecation process with some more complex intermediary handling: #2913 (comment)

I have not yet decided how to go forward with this specific case, although input from you, @alexaryn and other interested parties is greatly appreciated. As a general rule of thumb, using clone_from or a non-existing output file usually is the safest approach.

@stefan6419846 stefan6419846 changed the title PdfWriter.append not working in 5.2.0 PdfWriter as context manager while passing existing file not working in 5.2.0 Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-regression Regression introduced as a side-effect of another change workflow-merge From a users perspective, merging is the affected feature/workflow
Projects
None yet
Development

No branches or pull requests

2 participants