Add rotation preprocessing option #648

Balearica · 2022-08-23T02:45:02Z

Tesseract performs extremely poorly when text is at an angle. For example, below is a scan with ~5 degrees of rotation. The first image shows the text Tesseract recognized without applying preprocessing while the second image shows what Tesseract recognized after rotating.

The maintainers of the main Tesseract repo frequently suggest adding image preprocessing steps (including auto-rotation) to workflows to address this, however this option is not ideal for web users. Given we already include the Leptonica image processing library, we should be able to expose a rotation option without much effort. Auto-rotation would be ideal, but is likely significantly more difficult to implement.

Possibly related to #588, which requests high-level functions that expose processed (binarized) images.

Balearica · 2022-09-17T21:25:33Z

This feature has been added in the development branch for version 4 and will be included in that release. That branch is functional at present if you would like to try it out, and is described in more detail in #662. An example has also been included to demonstrate usage.

Balearica · 2022-11-25T20:29:59Z

Closing as this was added in Version 4.

Balearica mentioned this issue Sep 17, 2022

Version 4 Development and Changes #662

Closed

Balearica closed this as completed Nov 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rotation preprocessing option #648

Add rotation preprocessing option #648

Balearica commented Aug 23, 2022

Balearica commented Sep 17, 2022

Balearica commented Nov 25, 2022

Add rotation preprocessing option #648

Add rotation preprocessing option #648

Comments

Balearica commented Aug 23, 2022

Balearica commented Sep 17, 2022

Balearica commented Nov 25, 2022