The accuracy of v4.0.2 is reduced compared to v2.1.4 #717

lmk123 · 2023-03-02T02:57:55Z

Describe the bug
The accuracy of v4.0.2 is reduced compared to v2.1.4

To Reproduce

Use v2.1.4 and v4.0.2 versions to identify the following images respectively:

v2.1.4: https://codesandbox.io/s/eager-jasper-9drw5o

v2.1.4 accurately identifies the text in the diagram

v4.0.2: https://codesandbox.io/s/busy-blackburn-pes3yi

The content recognized by v4.0.2 is garbled

Expected behavior
v4.0.2 can accurately recognize the text in the figure

Desktop (please complete the following information):

OS: macOS
Browser: Chrome 110
Version: v2.1.4 and v4.0.2

lmk123 · 2023-03-02T03:46:44Z

I found that tesseract cli is able to recognize the text properly, maybe tesseract.js needs to upgrade tesseract from 5.1.0 to 5.3.0?

$ tesseract https://user-images.githubusercontent.com/5035625/222316349-c283adee-5e97-4f54-b018-7d914f7988f7.png - -l eng
Estimating resolution as 288
As these settings are reverted after the job, this allows for using different parameters for specific jobs when
working with schedulers

Balearica · 2023-03-07T21:26:02Z

This is an interesting issue--I was able to replicate using the image provided. Notably, this image has light text on a dark background which Tesseract deals with differently (it needs to detect and invert). When the image is inverted ahead of time (see attached image) it recognizes properly. Therefore, perhaps the issue is specific to this type of text.

When I have some free time I will update the version of Tesseract we're using and see if that resolves. There do appear to have been some changes relating to inverted text.

Balearica · 2023-03-30T02:46:14Z

Updating Tesseract to 5.3.0 appears to have resolved--must have been a bug with the version of Tesseract we were using before. I've updated Tesseract.js and created a new release (v4.0.3), so updating Tesseract.js to the latest version should resolve. Thank you for reporting this issue.

Balearica closed this as completed Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The accuracy of v4.0.2 is reduced compared to v2.1.4 #717

The accuracy of v4.0.2 is reduced compared to v2.1.4 #717

lmk123 commented Mar 2, 2023 •

edited

Loading

lmk123 commented Mar 2, 2023 •

edited

Loading

Balearica commented Mar 7, 2023

Balearica commented Mar 30, 2023

The accuracy of v4.0.2 is reduced compared to v2.1.4 #717

The accuracy of v4.0.2 is reduced compared to v2.1.4 #717

Comments

lmk123 commented Mar 2, 2023 • edited Loading

lmk123 commented Mar 2, 2023 • edited Loading

Balearica commented Mar 7, 2023

Balearica commented Mar 30, 2023

lmk123 commented Mar 2, 2023 •

edited

Loading

lmk123 commented Mar 2, 2023 •

edited

Loading