PDF Link Checker

Github Action and CLI tool to automatically check all links in a PDF for availability to find dead or broken links.

Usage

Install the package using pip

pip install pdflinkchecker-cli

and use it as a CLI tool

pdflinkchecker .                          # to search for and check all pdfs in the current directory recursively
pdflinkchecker path/to/pdf                # to check a specific file
pdflinkchecker path/to/pdf1 path/to/pdf2  # to check multiple specific files

Exemplary output looks like this

Analyzed /data/dummy1.pdf, found the following types of links/http codes: {200: 13}

Analyzed /data/dummy2.pdf, found the following types of links/http codes: {'mail': 4, 'tel': 4, 200: 49, 'error': 3}
|   Page Number | URL                      | Details                                                             |
|---------------+--------------------------+---------------------------------------------------------------------|
|             1 | https://www.example1.com | <HTTPError 999: 'INKApi Error'>                                     |
|             1 | https://www.example2.com | URLError(timeout('_ssl.c:1112: The handshake operation timed out')) |
|             1 | https://www.example3.com | <HTTPError 403: 'Forbidden'>                                        |

To use the Github Action, create a pdf_link_checker.yml in .github/workflows:

on: [ push ]

jobs:
  check_pdf_links:
    runs-on: ubuntu-latest
    name: Check PDF Links
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: PDF Link Checker
        uses: a-nau/pdf-link-checker@v0.2.0
        with:
          paths: '.'  # checks all PDFs, otherwise specify to file path(s)

To run within Docker

docker build -t pdf_link_checker .
docker run -it --rm --mount type=bind,source=${PWD},target=/data/ --name pdf_link_checker pdf_link_checker /data/.

Credits

I started from pdf-link-checker by Matt Briggs. Especially get_links_from_page is still heavily borrowed from the original.

This work is licensed under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
pdflinkchecker_cli		pdflinkchecker_cli
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
entrypoint.sh		entrypoint.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Link Checker

Usage

Credits

About

Releases 5

Packages

Languages

License

a-nau/pdf-link-checker

Folders and files

Latest commit

History

Repository files navigation

PDF Link Checker

Usage

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages