Retrieving the PDF from a web link via urllib package and parsing through the file to read and store the text within the PDF file. The text is read using the fitz library. Note that there is no need for OCR on the PDF here because it already contains machine-readable text. This code won't work on files that need OCR to obtain readable text. Note: the PDF file from the link is solely the property of the original owners; this is a learning exercise.
-
Notifications
You must be signed in to change notification settings - Fork 0
now-youre-gittin-it/web-scraped-pdf-reader
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Retrieving the PDF from a web link and parsing through the file to read and store the text within the PDF file.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published