Skip to content

ohumkar/Libcam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Libcam

Book cover text recognition

Project Structure

images
Libcam
 !--utils
 !   !--detection.py
 !   !--ner.py
 !   !--processing.py
 !
 !--main.py
 !--frozen_east_text_detection.pb

cam.pydetection.py → extract text regions using pretrained east model and run tesseract on detected text regions
preprocessing.py → image processing on detected text regions
main.py → run program by parsing arguments
ner.py → nlp functions to perform named entity recognition

Installation

Create virtual environment :
conda create -n envname python=3.7
Activate virtual environment:
activate envname
Navigate to location you want to clone repository:
cd desired/location/
Clone this repository:
git clone https://github.com/ohumkar/Libcam.git
cd Libcam
Install required libraries
pip install -r requirements.txt
Download spacy small english model
python -m spacy download en_core_web_sm

Usage

cd into the Libcam repo & run the program

  1. To run with cam and east detector:
    python main.py --east frozen_east_text_detection.pb --image cam --detector east --padding 0.1
  2. To run on pretrained image and east detector:
    python main.py --east frozen_east_text_detection.pb --image images/8.jpg --detector east --padding 0.1

Arguements :

  • --east : location of pretrained east model
  • --image :
    • cam → for accesing webcam ('Space' to Capture / 'Esc' to exit)
    • image_path → for locally saved image
  • --detector :
    • east : Use EAST detector
    • tess : Use Original Pytesseract
  • --padding : padding to give bouding boxes, (0.05 or 0.1 works best)
  • --width : (default 320) Width of resized image which must be multiples of 32
  • --height : (default 320) Height of resized image which must be multiples of 32
  • --min-confidence : (default 0.5) Minimum confidence for region to be detected as text

Pipeline :


  • Captured image is resized and a forward pass is made through the EAST detector which outputs bouding boxes along with confidence scores. Important boxes are retained by thresholding on the basis of confidence scores
  • Image processing is done on each of the detected text regions before passing it to the tesseract-ocr to recognize text
  • Named Entity REcognition is performed on the ocr output using spacy and PERSON entities are extracted as Author from the text, while remaining is marked as Title

Challenges faced

Recognizing scene text is a challenging problem, even more so than the recognition of scanned documents. As opposed to the simple ocr task in which the text is usually over a plain background this task is a bit different as it needs to needs to detect text in natural scene images which contain much more noise in the background. Hence a robust text detector was to be used.

OCR is never 100 % accurate and can prove to be challenging even for the state-of-the-art OCR methods. Given the task of performing ocr on natural scene images the output is of even lower quality. A few factors which cause natural scene detection to be tough are :

  • Image/sensor noise
  • Viewing angles
  • Lighting conditions
  • Resolution

To help the tesseract to detect text in images, detection using another robust algorithm was needed to be done i.e EAST in this case. Text regions detected by the EAST were then sent to the tesseract for recognition. The final output was mainly influenced by the following:

  • Performance of the EAST → sometimes fails to capture words → bounding boxes include some part of other words → degrading the performance of the latter ocr task
  • Output of the tesseract → most of the times the ocr outputs garbage text strings or correct strings along with random characters. Such kind of output further makes the task of NER difficult finally resulting in a poor output of the program

Improvements :

  • A better Image processing system for the EAST detector. Since the current model though robust is finding text regions, it is not precise in drawing the boxes over each word.
  • Cleaning of garbage ocr output. Most of the times the ocr output consits of random special characters / numbers / repeated characters.
  • Better NER. Current method works quite well but has some flaws for eg. If the complete ocr-text is capitalized, it fails to classify names as it generally assumes names begin with a captial letter (Eg. Probiility of 'Alex' being classified as a name is higher than 'ALEX' or 'alex'

Note that with each improvement in the earlier task, output quality of latter task will be greatly improved

About

Book cover text recognition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages