-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup __add_triple_context. #1271
Conversation
(cherry picked from commit 85e1cfe)
2 similar comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing all tests and seems to make sense to me
rdflib/plugins/stores/memory.py
Outdated
if self.__tripleContexts[triple] == self.__defaultContexts: | ||
del self.__tripleContexts[triple] | ||
if triple_context == self.__defaultContexts: | ||
del triple_context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this delete the context from self.__tripleContexts
? It appears to me, that del triple_context
just does nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it strongly seems you are right.
>>> x={"a","b"}
>>> y=x
>>> del y``
>>> x``
set(['a', 'b'])``
>>> y``
Traceback (most recent call last):``
File "<stdin>", line 1, in <module>``
NameError: name 'y' is not defined``
It should probably be:
del self.__tripleContexts[triple]
The nasty thing is that it does not appear in the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the fix in eb5e025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right, thanks @white-gecko
For the wordnet file (cf. #1261) I could see an improvement of 2% for the total time and for some file of mine with just 807682 triples it was 4% faster (the lower table). My setup was not dedicated to performing the test but ran daily things like video conference next to the tests, this explains the high variance. The measurements were executed alternating (once improved, then master, then again improved, …) to provide equal chances.
So if the code is corrected, still this is missing: #1271 (comment), it can be merged. |
The tests fail now, I can reproduce this locally, tests pass for 8086068 but fail for eb5e025. Looking at the code eb5e025 should be correct, but … @ashleysommer can you explain us, what is happening? |
It looks promising performance-wise but if you do not mind, I will restart from scratch to understand where this failures come from. What do you think ? Could we put this PR in draft mode until it is properly tested and understood ? |
I've converted it to draft mode, as requested. But as it seems the test pass now. Raising KeyError instead of IndexError might be part of understanding the situation. So you agree @rchateauneu ? |
Yes. This could not work. |
Hi, I'm looking into this now. I did have in mind to try this kind of optimization when I wrote the new MemoryStore implementation, but I didn't get around to it. |
Lately I've been using |
I have done more tests to check the memory behaviour using tracemalloc. The test script, using english-wordnet-2020.ttl, is attached: The result are identical, with about three runs each. Surprisingly, the same code might give slightly different number if the machine has a different load.
|
Now we are done :-) Thank you. |
Related to #1261.
(cherry picked from commit 85e1cfe)
This eliminates between two and five lookups in the container self.__tripleContexts (depending on the execution) by storing its reference in the variable triple_context = self.__tripleContexts[triple], and reusing it.