-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tesseract crash after glibc update on linux (when two languages selected) #1314
Comments
There's probably memory corruption happening that just happens to crash your code after this malloc tweak. It might be caused by memory getting deallocated prematurely. If you set "org.bytedeco.javacpp.nopointergc" to "true" and that stops the crashing, then that's what is happening. You'll need to figure out what is getting deallocated that shouldn't be. |
Thank you so much on your reply! I have tried Reproducer code looks as follows: public static void main(String[] args) {
TessBaseAPI tessBaseApi = null;
ETEXT_DESC tessMonitor = null;
Mat imageMat = null;
BytePointer outText = null;
try (InputStream is = TessApi.class.getClassLoader().getResourceAsStream("sample01s.png")) {
tessBaseApi = new TessBaseAPI();
int initResult = tessBaseApi.Init(System.getenv("TESSDATA_PREFIX"), "chi_tra+eng"); // Crashes here on glibc version 2.28-216.el8 and later
if (initResult == 0) {
LOG.info("TessAPI initialization SUCCESS with langs: " + tessBaseApi.GetInitLanguagesAsString().getString());
} else {
LOG.severe("TessAPI initialization FAILED, initCode=" + initResult);
}
tessBaseApi.SetPageSegMode(tesseract.PSM_AUTO);
BufferedImage image = ImageIO.read(is);
imageMat = Java2DFrameUtils.toMat(image);
tessBaseApi.SetImage(imageMat.data().asBuffer(), imageMat.size().width(), imageMat.size().height(), imageMat.channels(), (int) imageMat.step1());
tessMonitor = tesseract.TessMonitorCreate();
tessBaseApi.Recognize(tessMonitor);
outText = tessBaseApi.GetUTF8Text();
System.out.println("OCR output:\n" + outText.getString());
} catch (IOException ex) {
LOG.log(Level.SEVERE, null, ex);
} finally {
close(imageMat);
close(outText);
close(tessMonitor);
close(tessBaseApi);
}
} I noticed that for some language pairs (e.g.
While starting from glibc version 2.28-216.el8 - it simply crashes. Investigating it further it seems that this issues occurs when the following conditions are TRUE:
So I have checked all languages traindata which contain
So seems it is not related to the bytedeco wrapper and I will post this issue in the Tesseract repo. |
Moreover just tried using |
I am afraid I need a bit more help here...
But using my reproducer code with
That warning message I am afraid Tesseract project team will not accept my bug report as it is not reproduced using compiled cli version. |
The cli is also available: |
Thank you once again, yeah, I was able to use that approach. It didn't work from the code (due to linking error
So next I decided to investigate more what I have installed with this:
And found the following:
So it looks like tesseract from that repo uses outdated lib v5.0.3 instead of v5.3 (even |
Hello @saudet! May I please re-open this issue for quick clarification: so I tried to test long-awaited javacv-platform:1.5.10 release which contains fixed Tesseract version for this issue and faced with the another issue: I have already confirmed that all works good on the Rocky Linux 9, but at this moment I am not able to upgrade to that version due to other dependencies. So may I please seek for your advise in this case: is there any chance to return to the builds of the javacv-platform using glibc 2.28, or now the only solution for me is to compile & build javacv-platform from sources on my own? Many thanks! |
Duplicate of #1379 |
Okay, I understood, the answer is "compile on your own". ;) Just curious what is for the "enhancement" label was added...? Are there any enhancements planned for compiling, or what? :) |
It means someone has to spend time to make those builds work and maintain them
|
I am not sure from which side to approach this issue – thus starting from here hoping that someone could help to point me how better to deal with this issue.
Faced with weird issue recently: tesseract started crashing when two languages selected for recognition (in my case that was "chi_tra+eng"). When languages selected separately (one by one) - everything works fine.
I am using
org.bytedeco.tesseract-platform
and this issue happens on both versions:5.2.0-1.5.8
and5.0.1-1.5.7
.Crash dump I am attaching separately, see: glibc issue - hs_err_pid57917.log
Further investigation revealed that it started crashing after CentOS 8 Stream casual packages update with
dnf update
command, and particularly after glibc update.From the glibc history below I was able to identify that 2.28-214.el8 is the last properly working version, starting with version 2.28-216.el8 - tesseract start crashing:
So far the only solution I could get was to downgrade glibc back to 2.28-214.el8 and add it to exclusions of dnf, what obviously is a temporary workaround until this is resolved properly.
Additionally I have found glibc commits history and seems like its change log (but I am not a C guy thus it didn't help me much):
https://git.centos.org/rpms/glibc/commits/c8s
https://rpmfind.net/linux/RPM/centos/8-stream/baseos/x86_64/Packages/glibc-2.28-216.el8.x86_64.html
Highly appreciate any support with this issue!
The text was updated successfully, but these errors were encountered: