Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images with transparency mask are not correctly extracted #1599

Closed
pubpub-zz opened this issue Feb 2, 2023 · 3 comments · Fixed by #1834
Closed

Images with transparency mask are not correctly extracted #1599

pubpub-zz opened this issue Feb 2, 2023 · 3 comments · Fixed by #1834
Labels
dependencies Pull requests that update a dependency file is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

Comments

@pubpub-zz
Copy link
Collaborator

pubpub-zz commented Feb 2, 2023

When an image is extracted

from pypdf import PdfReader, PdfWriter
from pypdf.generic import NameObject, NullObject
from PIL import Image
from io import BytesIO

w = PdfWriter()
w.append("resources/labeled-edges-center-image.pdf")

for p in w.pages:
    for image_file_object in p.images:
        print(image_file_object.name)
        ii = Image.open(BytesIO(image_file_object.data))
        b = BytesIO()
        ii.save(b, "pdf", quality=60, resolution=19.0, optimize=True)
        rrr = PdfReader(b)
        n = NameObject("/" + "".join(image_file_object.name.split(".")[:-1]))
        ind = p["/Resources"]["/XObject"].raw_get(n)
        w._objects[ind.idnum] = NullObject()  # to cleanup file
        p["/Resources"]["/XObject"][n] = (
            rrr.pages[0]["/Resources"]["/XObject"]["/image"].clone(w).indirect_reference
        )
w.write("tt.pdf")

edit : code updated

Originally posted by @pubpub-zz in #1546 (comment)

@MartinThoma MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Feb 4, 2023
@pubpub-zz pubpub-zz added the dependencies Pull requests that update a dependency file label Feb 26, 2023
@pubpub-zz
Copy link
Collaborator Author

this issue will require update of PIL

@pubpub-zz
Copy link
Collaborator Author

PIL 2.5.0 is planned for beginning of april:
python-pillow/Pillow#6989
😀

@radarhere
Copy link

For clarity - it is Pillow 9.5.0 that will be released at the beginning of April. Pillow 2.5.0 was released a long time ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants