pdf2image

A python 2.7 and 3.5+ module that wraps pdftoppm and pdftocairo to convert PDF to a PIL Image object

How to install

pip install pdf2image

Windows

Windows users will have to install poppler for Windows, then add the bin/ folder to PATH.

Mac

Mac users will have to install poppler for Mac.

Linux

Most distros ship with pdftoppm and pdftocairo. If they are not installed, refer to your package manager to install poppler-utils

Platform-independant (Using `conda`)

Install poppler: conda install -c conda-forge poppler
Install pdf2image: pip install pdf2image

How does it work?

from pdf2image import convert_from_path, convert_from_bytes

from pdf2image.exceptions import (
    PDFInfoNotInstalledError,
    PDFPageCountError,
    PDFSyntaxError
)

Then simply do:

images = convert_from_path('/home/belval/example.pdf')

OR

images = convert_from_bytes(open('/home/belval/example.pdf', 'rb').read())

OR better yet

import tempfile

with tempfile.TemporaryDirectory() as path:
    images_from_path = convert_from_path('/home/belval/example.pdf', output_folder=path)
    # Do something here

images will be a list of PIL Image representing each page of the PDF document.

Here are the definitions:

convert_from_path(pdf_path, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False)

convert_from_bytes(pdf_file, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False)

Need help?

Use the mattermost chat to ask questions on the helpdesk and get direct support.

What's new?

grayscale parameter allows you to convert images to grayscale (-gray in pdftoppm CLI)
single_file parameter allows you to convert the first PDF page only, without adding digits at the end of the output_file
Allow the user to specify poppler's installation path with poppler_path
Fixed a bug where PNGs buffer with a non-terminating I-E-N-D sequence would throw an exception
Fixed a bug that left open file descriptors when using convert_from_bytes() (Thank you @FabianUken)
fmt='tiff' parameter allows you to create .tiff files (You need pdftocairo for this)
transparent parameter allows you to generate images with no background instead of the usual white one (You need pdftocairo for this)
strict parameter allows you to catch pdftoppm syntax error with a custom type PDFSyntaxError

Performance tips

Using an output folder is significantly faster if you are using an SSD. Otherwise i/o usually becomes the bottleneck.
Using multiple threads can give you some gains but avoid more than 4 as this will cause i/o bottleneck (even on my NVMe SSD!).
If i/o is your bottleneck, using the JPEG format can lead to significant gains.
PNG format is pretty slow, this is because of the compression.
If you want to know the best settings (most settings will be fine anyway) you can clone the project and run python tests.py to get timings.

Limitations / known issues

A relatively big PDF will use up all your memory and cause the process to be killed (unless you use an output folder)

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github		.github
pdf2image		pdf2image
tests		tests
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf2image

How to install

Windows

Mac

Linux

Platform-independant (Using `conda`)

How does it work?

Need help?

What's new?

Performance tips

Limitations / known issues

About

Releases

Packages

Languages

License

Plat251/pdf2image

Folders and files

Latest commit

History

Repository files navigation

pdf2image

How to install

Windows

Mac

Linux

Platform-independant (Using conda)

How does it work?

Need help?

What's new?

Performance tips

Limitations / known issues

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Platform-independant (Using `conda`)

Packages