-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCR-D processor is leaky #66
Comments
I've seen these kinds of memory leaks happen with TF 1, but AFAICR not with TF 2. (See https://github.com/qurator-spk/sbb_column_classifier - I think just upgrading fixed it, but maybe the "TF best practices" were necessary too.) |
What I describe happens on TF 2.13.1, which should be fully supported. This issue is a show-stopper for me, as with OCR-D, it's not even possible to keep the results already produced (since they are only persisted in the METS at the end of the loop). @mikegerber what do you mean by TF Best Practices – some particular document perhaps? |
The things I did in sbb_column_classifier to make it process ~ 20 million pages: 1a. Updating to TF2 I'm not sure if I did 1b to fix any memory leaks, may have just been for better performance. |
When processing a document of 1.5k pages of medium size (1-2 MP each), I am observing a slow but steady increase in RSS from 4 GB up to 14 GB after 1.2k pages at which point the process gets crashed by the OS (
Killed
).I do not see any Python bindings accessible to the input file loop which could accumulate such data without ever being GCed.
I am on CUDA 11.8
Has anybody seen this before?
The text was updated successfully, but these errors were encountered: