Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issues when long-running #74

Open
TinoDidriksen opened this issue Mar 22, 2021 · 2 comments
Open

Memory issues when long-running #74

TinoDidriksen opened this issue Mar 22, 2021 · 2 comments

Comments

@TinoDidriksen
Copy link
Member

CG-3's memory use when parsing huge corpora tends to increase infinitely, but haven't ever been able to detect any leaks. Pretty sure it's because the corpus has a lot of unique words and tags, which CG-3 remembers forever. Need a way to LRU tags.

@TinoDidriksen
Copy link
Member Author

If the stream has flushes, then it should be fairly easy to reset almost all memory after every X windows at next flush.

Without flushes, it's considerably harder.

@TinoDidriksen
Copy link
Member Author

The C API's cg3_run_grammar_on_text() is basically flushing after every call, so that's another good spot to reset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant