Libcam

Book cover text recognition

Project Structure

images
Libcam
 !--utils
 !   !--detection.py
 !   !--ner.py
 !   !--processing.py
 !
 !--main.py
 !--frozen_east_text_detection.pb

cam.py → detection.py → extract text regions using pretrained east model and run tesseract on detected text regions
preprocessing.py → image processing on detected text regions
main.py → run program by parsing arguments
ner.py → nlp functions to perform named entity recognition

Installation

Create virtual environment :
conda create -n envname python=3.7
Activate virtual environment:
activate envname
Navigate to location you want to clone repository:
cd desired/location/
Clone this repository:
git clone https://github.com/ohumkar/Libcam.git
cd Libcam
Install required libraries
pip install -r requirements.txt
Download spacy small english model
python -m spacy download en_core_web_sm

Usage

cd into the Libcam repo & run the program

To run with cam and east detector:
python main.py --east frozen_east_text_detection.pb --image cam --detector east --padding 0.1
To run on pretrained image and east detector:
python main.py --east frozen_east_text_detection.pb --image images/8.jpg --detector east --padding 0.1

Arguements :

--east : location of pretrained east model
--image :
- cam → for accesing webcam ('Space' to Capture / 'Esc' to exit)
- image_path → for locally saved image
--detector :
- east : Use EAST detector
- tess : Use Original Pytesseract
--padding : padding to give bouding boxes, (0.05 or 0.1 works best)
--width : (default 320) Width of resized image which must be multiples of 32
--height : (default 320) Height of resized image which must be multiples of 32
--min-confidence : (default 0.5) Minimum confidence for region to be detected as text

Pipeline :

Captured image is resized and a forward pass is made through the EAST detector which outputs bouding boxes along with confidence scores. Important boxes are retained by thresholding on the basis of confidence scores
Image processing is done on each of the detected text regions before passing it to the tesseract-ocr to recognize text
Named Entity REcognition is performed on the ocr output using spacy and PERSON entities are extracted as Author from the text, while remaining is marked as Title

Challenges faced

Recognizing scene text is a challenging problem, even more so than the recognition of scanned documents. As opposed to the simple ocr task in which the text is usually over a plain background this task is a bit different as it needs to needs to detect text in natural scene images which contain much more noise in the background. Hence a robust text detector was to be used.

OCR is never 100 % accurate and can prove to be challenging even for the state-of-the-art OCR methods. Given the task of performing ocr on natural scene images the output is of even lower quality. A few factors which cause natural scene detection to be tough are :

Image/sensor noise
Viewing angles
Lighting conditions
Resolution

To help the tesseract to detect text in images, detection using another robust algorithm was needed to be done i.e EAST in this case. Text regions detected by the EAST were then sent to the tesseract for recognition. The final output was mainly influenced by the following:

Performance of the EAST → sometimes fails to capture words → bounding boxes include some part of other words → degrading the performance of the latter ocr task
Output of the tesseract → most of the times the ocr outputs garbage text strings or correct strings along with random characters. Such kind of output further makes the task of NER difficult finally resulting in a poor output of the program

Improvements :

A better Image processing system for the EAST detector. Since the current model though robust is finding text regions, it is not precise in drawing the boxes over each word.
Cleaning of garbage ocr output. Most of the times the ocr output consits of random special characters / numbers / repeated characters.
Better NER. Current method works quite well but has some flaws for eg. If the complete ocr-text is capitalized, it fails to classify names as it generally assumes names begin with a captial letter (Eg. Probiility of 'Alex' being classified as a name is higher than 'ALEX' or 'alex'

Note that with each improvement in the earlier task, output quality of latter task will be greatly improved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Libcam

Project Structure

Installation

Usage

Arguements :

Pipeline :

Challenges faced

Improvements :

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
images		images
utils		utils
Desktop - 3.jpg		Desktop - 3.jpg
README.md		README.md
frozen_east_text_detection.pb		frozen_east_text_detection.pb
main.py		main.py
requirements.txt		requirements.txt

ohumkar/Libcam

Folders and files

Latest commit

History

Repository files navigation

Libcam

Project Structure

Installation

Usage

Arguements :

Pipeline :

Challenges faced

Improvements :

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages