Skip to content

OCR Methodologies

Jack edited this page Sep 18, 2020 · 9 revisions

[Wiki under construction: to be updated & improved]

The current available training and OCR methodologies are listed below:

  • Pixel Averaging

More machine learning methodologies and processes will be added in the future.

Pixel Averaging

Pixel averaging is a simple method. It looks at the training data provided for each character and records how often pixels in the image are black (text) or white (background), this information is stored with the created model. As images are reduced to black and white, colour information is not required so images are reduced to a 2D matrix storing only 1 or 0 for pixel values in their respective locations.

Training

In image 1 it is highlighted for character B different values can be seen for pixels, depending on how often they contained information. These values are stored in the model as negatives. Pixels which never contained images are set to a user defined negative value, in image 1 this is set to -1.

[image 1] Pixel Averaging Overview

OCR

When an image is loaded to have OCR carried out, white pixels are set to 1, black pixels are set to -1. In image 1 this is shown as image G/56.jpg is loaded as the OCR target.

To determine the character in the loaded image a scoring system is used; the frobenius inner product is found between the OCR target and each character in the model. The highest scoring inner product is deemed to be from the closest matching model character.

An increased negative scoring value in training will cause a greater reduction of score for areas in which the OCR target has character (areas with +1) but the model does not (negative scoring value). Areas where the OCR target has black (-1) and the model also does (value < negative scoring value) result in a score increase.

Considerations

Pixel Averaging comes with some considerations as how to be properly trained and used.

  • Pixel Locations are an important part of how a score is calculated, as such training data and target images should have the characters in the same location, this can be achieved by cropping the training data and targets down as close to the character as possible, and ensuring the character alignment is always constant (eg: centre of image).

  • Unclean edges May cause a change in score in some cases especially if many training images have unclean edges, this is shown with G/56.jpg in image 1. TO overcome this when training a model a user defined edge suppression size (pixels) can be used which will suppress the edges of training images.

  • Character Orientations. Multiple character orientations should not be trained as the same character. Instead these should be trained as separate characters (eg A, A1, A2, A3). This is due to the model being an average of the training data and it will come out close to a circle of pixels which appeared as data at different times.

Clone this wiki locally