Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR-D processor is leaky #66

Open
bertsky opened this issue Mar 1, 2024 · 3 comments
Open

OCR-D processor is leaky #66

bertsky opened this issue Mar 1, 2024 · 3 comments

Comments

@bertsky
Copy link
Contributor

bertsky commented Mar 1, 2024

When processing a document of 1.5k pages of medium size (1-2 MP each), I am observing a slow but steady increase in RSS from 4 GB up to 14 GB after 1.2k pages at which point the process gets crashed by the OS (Killed).

I do not see any Python bindings accessible to the input file loop which could accumulate such data without ever being GCed.

I am on CUDA 11.8

Has anybody seen this before?

@mikegerber
Copy link
Member

I've seen these kinds of memory leaks happen with TF 1, but AFAICR not with TF 2. (See https://github.com/qurator-spk/sbb_column_classifier - I think just upgrading fixed it, but maybe the "TF best practices" were necessary too.)

@bertsky
Copy link
Contributor Author

bertsky commented Apr 29, 2024

What I describe happens on TF 2.13.1, which should be fully supported.

This issue is a show-stopper for me, as with OCR-D, it's not even possible to keep the results already produced (since they are only persisted in the METS at the end of the loop).

@mikegerber what do you mean by TF Best Practices – some particular document perhaps?

@mikegerber
Copy link
Member

@mikegerber what do you mean by TF Best Practices – some particular document perhaps?

The things I did in sbb_column_classifier to make it process ~ 20 million pages:

1a. Updating to TF2
1b. IIRC using TF graph execution, TF functions (JIT?)
2. Dealing with flow problems due to the interweaved CPU processing (Would probably look into using some kind of bounded queues now, but solved it using semaphores at the time.)

I'm not sure if I did 1b to fix any memory leaks, may have just been for better performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants