-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Got "Killed" running ocrd-cis-ocropy-clip #74
Comments
Thanks @stefanCCS for the report. I cannot say much without more context – either in form of debug level log output (e.g. by running with But that workflow itself is also flawed – please see here. |
Addtional information with -
|
For getting the image, please contact myself on Gitter private Chat. |
Thanks! I am able to reproduce this now. Unfortunately, it's not strictly a bug, but just inefficient programming. Clipping necessarily has to compare N by N regions somehow. And when coordinates do suggest a pair intersects, the algorithm ultimately needs to look at both regions' masks into the page, to check for overlapping connected components. So to avoid calculating the same masks over and over again, I decided to pre-calculate them (trading CPU time with RSS size). But if the images are large (yours is 4726x6883) and there are many regions (Tesseract found 315 of them), then obviously there is a lot to store in memory. This just scales badly. Efficiency was not a primary concern of the second project phase (and this processor is a stop-gap anyway). I'm not saying "won't fix", but I am not sure whether we should really prioritize this right now. I don't see an easy way out to be honest. (I could probably move away from page masks to pair-wise masks spanning the joint bboxes.) Perhaps the best workaround for now is to downscale your images to 300 DPI. |
Ok, understood. |
Running this workflow below with an image creates a "killed" error like this:
For getting the image, please contact myself on Gitter
Workflow used:
The text was updated successfully, but these errors were encountered: