-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory consumption of summarizer #2298
Reduce memory consumption of summarizer #2298
Conversation
I need #2263 to add some tests about graph. |
#2263 merged, feel free to continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for PR @horpto, can you also please update your first comment in PR with
- what was a reason for these changes (links to complaints about RAM)
- little benchmark (time & RAM)
3.6.0
vssummarization-refactoring
to better understand, how this improved
scores = [] | ||
for index in range(self.corpus_size): | ||
score = self.get_score(document, index) | ||
if score > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case len(scores) <= self.corpus_size
, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's actually quite sparse array, isn't it?
In summarizer._set_graph_edge_weights
such documents with little weight will be dropped anyway, so there is no reason to waste extra memory. And what's more, if we are needed a dense array we can uncompactify this bow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get, how we understand ids of documents that have 0 scores in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easy. Like words with 0 weight in bow. We have ids of docs with not zero weight. They are saved in bag-of-docs (I should rename function name part from bow - bag-of-weights to bod - bag-of-docs). If doc id isn't in bag-of-docs, so weight of doc is 0.
Awesome result, nice improvement for new release, thanks @horpto 👍 |
By a request of @menshikh-iv: