Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract.js 5.0.0 loading wrong language data #834

Closed
Balearica opened this issue Oct 3, 2023 · 1 comment · Fixed by #836
Closed

Tesseract.js 5.0.0 loading wrong language data #834

Balearica opened this issue Oct 3, 2023 · 1 comment · Fixed by #836

Comments

@Balearica
Copy link
Member

Balearica commented Oct 3, 2023

When oem is set to 1 (LSTM_ONLY) the LSTM-only data should be loaded, however this is not happening--the LSTM + Legacy data is still being loaded. This can be verified by looking at file sizes--the integerized LSTM-only data is only ~2MB for most languages.

In addition to fixing this issue, new unit tests should be added to confirm that the correct data is being downloaded. The unit tests we have hard-code what language data is used, which effectively bypass the language data from being tested. As the vast majority of users presumably use the default arguments, it is important that this be tested.

@Balearica
Copy link
Member Author

Resolved by #836. Will be included in next npm release (5.0.1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant