-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High impact on cpu and memory when running deep taxonomy benchmark #1053
Comments
So |
For the moment we have a workaround and use 'etc:isTypeOf' as 'owl:SymmetricProperty' of 'rdf:type' So now we get
which is still a high impact on memory but a very good impact on cpu thanks to first argument indexing
|
It's been a while and I was recently testing our latest deep taxonomy benchmark which uses 300000 The good news is that it works, but the performance is to be improved before we can practically use it:
For Trealla we get
I guess this is because there is no JIT indexing yet. |
That is definitely a bug. Only the second argument of each RDF clause is instantiated, right? First instantiated argument indexing should activate there. |
Yes @mthom, the second argument of rdf:type is an rdfs:Class which is always instiated with one of the 300000 classes. |
Maybe related with the high impact on cpu is the high impact on memory:
versus
which is |
@josd: Trealla is structure sharing that might change a lot here. Rather compare it to a structure copying system like GNU, SICStus or SWI. |
@mthom: I think the first argument is instantiated in the very first clause: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'('http://example.org/etc#z','http://example.org/etc#N0'). So, first instantiated argument indexing will select the first argument for indexing. If indexing is the cause of this slowdown, then it may be worth refining the indexing strategy a bit. But a more robust solution would probably be #192. |
@UWN, SWI actually uses even less memory ?!
i.e. The ERROR's are due to my stubborn nature to use |
@josd: If you want to try a different strategy, you could start for example here: Line 1131 in 9ded144
For instance, does it make a difference for your program if you, as an experimental modification, hard-code the second argument for indexing (for predicates that have at least 2 arguments)? You could also consider implementing a heuristic that analyzes the clauses more thoroughly, especially for cases like yours where it is rather clear which argument should be chosen for indexing. |
Thanks for the sugestions and I was trying a few tweaks during the past hours, but all I got was several
My current reference is the N3 implementation https://github.com/josd/eye/tree/master/reasoning/dt using swipl and which takes about 1 sec
|
If I omit the first clause from the RDF predicate and leave the rest as-is, I get an out of memory error! That triggers it to build a 299,999 item index, which does not happen if just the first argument is instantiated in the first clause. Ideally, this code should be generated:
|
Exactly and similarly for instance with
or
instead of
|
For the time being in Heisseneye I am working around |
The deep taxonomy benchmark should work properly on rebis-dev now. |
Thank you very much @mthom and for the deep taxonomy benchmark with 30000 classes we now get
and for 300000 classes it is now a nice linear performance
|
Looking at some historical results I saw that they were done via forward chaining and so I changed dt.pl using gendt.pl by adapting line 15 and line 25.
Also the memory usage for depth 100000 is quite huge So for Scryer and depth 100000 we currently have
and we compare it with what we currently have in eye/reasoning/dt
|
We are now moving back to backward chaining and the performance is fine. |
I'm highly certain the previous blow up was caused by the lack of deep indexing. |
Me too and in SWI it is indeed like n*log(n). |
I hope so! There's quite a lot to do. I wonder if writing developer documents on the Rust side of Scryer would encourage, e.g., work on that front. I have an initial document I'm now hammering out/procrastinating on explaining how |
That sounds like the best news in years @pbonte and @RubenVerborgh |
While testing https://github.com/josd/retina/tree/cf0358f59cad43a8380ed828bf0c77c48bec547f with Scryer Prolog all examples and test cases in etc seem to work fine
but there is a practical concern with the high impact on cpu and memory when running the deep taxonomy benchmark
https://github.com/josd/retina/blob/cf0358f59cad43a8380ed828bf0c77c48bec547f/etc/dt.pl
So it takes almost 2 minutes and 1GB memory.
Compare this for instance with
This benchmark was designed by the late Harold Boley http://www.cs.unb.ca/~boley/
and is interesting to evaluate the subsumption capabilities of reasoners (using RDF and OWL)
e.g.
[[
SNOMED CT currently contains more than 300,000 medical concepts, divided into
hierarchies as diverse as body structure, clinical findings, geographic location and
pharmaceutical/biological product
]] -- https://searchhealthit.techtarget.com/definition/SNOMED-CT
The text was updated successfully, but these errors were encountered: