Python library to interact with the PDFTables.com API.
Supported versions of Python are listed in ci-build.yml.
pip: (requires git installed)
pip install git+https://github.com/pdftables/python-pdftables-api.git
pip: (without git)
pip install https://github.com/pdftables/python-pdftables-api/archive/master.tar.gz
Locally:
python setup.py install
If using pip, then use pip with the --upgrade
flag, e.g.
pip install --upgrade git+https://github.com/pdftables/python-pdftables-api.git
Sign up for an account at PDFTables.com and then visit the API page to see your API key.
Replace my-api-key
below with your API key.
import pdftables_api
c = pdftables_api.Client('my-api-key')
c.xlsx('input.pdf', 'output.xlsx')
To convert to CSV, XML or HTML simply change c.xlsx
to be c.csv
, c.xml
or c.html
respectively.
To specify Excel (single sheet) or Excel (multiple sheets) use c.xlsx_single
or c.xlsx_multiple
.
python -m unittest test.test_pdftables_api
If you are converting a large document (hundreds or thousands of pages), you may want to increase the timeout.
Here is an example of the sort of error that might be encountered:
ReadTimeout: HTTPSConnectionPool(host='pdftables.com', port=443): Read timed out. (read timeout=300)
The below example allows 60 seconds to connect to our server, and 1 hour to convert the document:
import pdftables_api
c = pdftables_api.Client('my-api-key', timeout=(60, 3600))
c.xlsx('input.pdf', 'output.xlsx')