SPARQL Aggregation functions shouldn't build up memory for each row #678
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL; DR: I'd like to discuss:
Feel free to comment anything you like
Some context:
I have been experimenting with a few SPARQL queries which should do a lot of counting.
The idea was to find correlations among different predicates in order to find the most promising queries on a data set. One query yielded a few million combinations which caused rdflib even without the DISTINCT keyword to build up enough memory consumption to start swapping.
The cause seemed to be evalGroup() which appended each incoming row to a list. I thought of two different approaches to avoid this, an OO style approach and using python's generators with their send() method. Both seemed to have similar memory requirements so I decided for the OO style approach for better readability.
I tried to get some response on the IRC channel to ask for style guides and such but during the festive season there was little activity. So I used the style guide I am used to. Please point out what to change to conform to your style guide.
After the refactoring I fixed the code to pass all the tests again. Are there additional requirements which are not implemented via the test suite?
Also I tried to implement a generic algorithm for the REDUCED keyword although it is not directly related to my original problem.