Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lossless conversion #266

Open
2V3EvG4LMJFdRe opened this issue Apr 9, 2023 · 3 comments
Open

Lossless conversion #266

2V3EvG4LMJFdRe opened this issue Apr 9, 2023 · 3 comments

Comments

@2V3EvG4LMJFdRe
Copy link

2V3EvG4LMJFdRe commented Apr 9, 2023

I need a reliable script that converts images to PDF and then another to revert the process.

To Reproduce

  1. My first script converts a set of images that are 34.3mb in total into a 34.3mb pdf file with img2pdf.
  2. My second script is using pdf2image to convert the pdf file "back" into images:
export PATH=/usr/local/bin:$PATH
/usr/local/bin/python3 <<'EOF' - "$@"

from pdf2image import convert_from_path, convert_from_bytes
from pdf2image.exceptions import (
    PDFInfoNotInstalledError,
    PDFPageCountError,
    PDFSyntaxError
)

import tempfile
with tempfile.TemporaryDirectory() as path:
    images_from_path = convert_from_path('/Users/user/test.pdf', thread_count=2, dpi=300, fmt='png', use_pdftocairo=True, jpegopt={"quality": 100, "optimize": True}, output_folder='/Users/user/testpdf')
EOF

Describe the bug

The result is a series of ppm files which would amount to a 500mb pdf. Lossless quality, but too big.
Using fmt='jpeg' outputs files which would amount to a 24.3mb pdf, showing its drop in quality when zooming in.
Is there a way to create better quality jpeg files, closer to the original files?

@2V3EvG4LMJFdRe
Copy link
Author

2V3EvG4LMJFdRe commented Apr 9, 2023

I've been doing some tests and even though I could jpegopt={"quality": 100, "optimize": True} it seems that the ppm export isn't actually lossless to begin with:

Original PDF

pdf

pdf2image PPM

ppm

It's noticeably more blurry.

@2V3EvG4LMJFdRe 2V3EvG4LMJFdRe changed the title JPG quality Lossless conversion Apr 9, 2023
@2V3EvG4LMJFdRe
Copy link
Author

2V3EvG4LMJFdRe commented Apr 9, 2023

It's a lot better with use_pdftocairo=True. Either jpg or png, outputs a much better image, but it's still not the original files. I wonder if such a process is possible at all.

@lbr991
Copy link

lbr991 commented Sep 7, 2023

Is there a way to do lossless compression?
If not, what is the way to go as close to lossless other than specifying png output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants