PowerShell script that converts a non-OCR .pdf into an OCR .pdf
This will convert a non-OCR pdf into an OCR pdf, as well as keep the original file name, plus batch ready.
- Converts a redenerable .pdf into .tiff files using imageMagick/GhostScript.
- Runs tesseract OCR on the .tiff files for readable text.
- Converts .tiff files back into .pdfs using tesseract.
- Deletes .tiff files.
- Download dependecies: imageMagick, GhostScript, Tesseract
- Set file paths in convert.ps1 to where .pdfs will be located - NOTE: SCRIPT WILL CONVERT ALL .pdfs IN SET DIRECTORY.
- Run convert.ps1 script.