-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Non-deterministic accidental object reuse #1995
Conversation
The following section of code could sometimes have unintended effects where pages of an earlier integrated PDF file were used instead. ``` writer = PdfWriter() reader1 = PdfReader(some_file) id_reader1 = id(reader1) writer.add_page(reader1.pages[0]) del reader1 reader2 = PdfReader(other_file) id_reader2 = id(reader2) writer.add_page(reader2.pages[0]) del reader2 writer.write(third_file) ``` because the `reader1` is no longer in memory when `reader2` gets initialized, the area in memory is free, so `id_reader1` and `id_reader2` might end up having the same value. Due to PyPDF using `id(reader)` internally for an object-cache, it sometimes happened that `writer.add_page(reader2.pages[0])` would result in duplicating `reader1.pages[0]` instead.
Using a WeakKeyDictionary was an implementation detail that does not have to be matched in the type.
@sjoerdjob Your PR cannot be merged like this as the CI fails:
There are now also a couple of merge conflicts. |
@MartinThoma this looks similar (different approach) to a PR you've already about random error. not sure this is still required |
We didn't have a test that could detect this type of issue, hence I added the test from this PR via #2244 @sjoerdjob Sorry that it took so long. I'm closing this PR now as the problem was fixed via #1841 and your test was added via #2244 |
Thank you for your support 🤗 If you want, I'll add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-) |
The following section of code could sometimes have unintended effects where pages of an earlier integrated PDF file were used instead.
because the
reader1
is no longer in memory whenreader2
gets initialized, the area in memory is free, soid_reader1
andid_reader2
might end up having the same value.Due to PyPDF using
id(reader)
internally for an object-cache, it sometimes happened thatwriter.add_page(reader2.pages[0])
would result in duplicatingreader1.pages[0]
instead.fixes #1788 .