Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The accuracy of v4.0.2 is reduced compared to v2.1.4 #717

Closed
lmk123 opened this issue Mar 2, 2023 · 3 comments
Closed

The accuracy of v4.0.2 is reduced compared to v2.1.4 #717

lmk123 opened this issue Mar 2, 2023 · 3 comments

Comments

@lmk123
Copy link

lmk123 commented Mar 2, 2023

Describe the bug
The accuracy of v4.0.2 is reduced compared to v2.1.4

To Reproduce

Use v2.1.4 and v4.0.2 versions to identify the following images respectively:

image

v2.1.4: https://codesandbox.io/s/eager-jasper-9drw5o

v2.1.4 accurately identifies the text in the diagram

v4.0.2: https://codesandbox.io/s/busy-blackburn-pes3yi

The content recognized by v4.0.2 is garbled

Expected behavior
v4.0.2 can accurately recognize the text in the figure

Desktop (please complete the following information):

  • OS: macOS
  • Browser: Chrome 110
  • Version: v2.1.4 and v4.0.2
@lmk123
Copy link
Author

lmk123 commented Mar 2, 2023

I found that tesseract cli is able to recognize the text properly, maybe tesseract.js needs to upgrade tesseract from 5.1.0 to 5.3.0?

$ tesseract https://user-images.githubusercontent.com/5035625/222316349-c283adee-5e97-4f54-b018-7d914f7988f7.png - -l eng
Estimating resolution as 288
As these settings are reverted after the job, this allows for using different parameters for specific jobs when
working with schedulers

@Balearica
Copy link
Member

This is an interesting issue--I was able to replicate using the image provided. Notably, this image has light text on a dark background which Tesseract deals with differently (it needs to detect and invert). When the image is inverted ahead of time (see attached image) it recognizes properly. Therefore, perhaps the issue is specific to this type of text.

issue-717-inverted

When I have some free time I will update the version of Tesseract we're using and see if that resolves. There do appear to have been some changes relating to inverted text.

@Balearica
Copy link
Member

Updating Tesseract to 5.3.0 appears to have resolved--must have been a bug with the version of Tesseract we were using before. I've updated Tesseract.js and created a new release (v4.0.3), so updating Tesseract.js to the latest version should resolve. Thank you for reporting this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants