OCR_FOR_PDFS

Optical Character Recognition for Scanned Documents

The program generates text from a scanned document in the form of a pdf, irrespective of the length of the document.
The code uses TesseractOCR to perform the task, and openCV to pre process the image which is generate from pdf2image module.

The accuracy of the OCR can be improved by:

Pre processing of the image using openCV can result in better accuracy.
Using a spell check after the extraction of the text can also improve the flow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

OCR_FOR_PDFS

Optical Character Recognition for Scanned Documents

Files

README.md

Latest commit

History

README.md

File metadata and controls

OCR_FOR_PDFS

Optical Character Recognition for Scanned Documents