-
-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#xpath performance #760
Comments
I thought that this would be a good idea, so I tried implementing your suggestion to see how it would work. I'm seeing failures where nodes aren't being found as a result, so it seems that it's not good enough to simply cache the |
consider the following piece of code:
These results are rather strange. As soon as you look for attributes somehow the context shifts. After the third xpath, we find foo instead of baz. I think this is a bug in libxml - or some well designed behaviour i don't get :-). We could circumvent it maybe if we make the context node settable not only in the constructor for XPathContext, but afterwards (libxml has this too). According to my tests this is still MUCH faster. E.g.:
Juergen |
I don't have the code in front of me to check, but it does seem probable to me that this is by design. (Otherwise it seems like an obvious optimisation that Nokogiri would have possibly already taken up.) By the way, your example was difficult to follow; I think some XML was stripped. If you encase the code examples in triple-backticks it'll render it literally. Next time I get a chance I'll see if I can find what's going on with this. |
@etm - 1/3 of the time spent ... how much time is that? Can you show how you're benchmarking and tell us what Ruby version (output from |
@flavorjones: I see much less than 1/3rd—about 1% is spent allocating I'm using ruby-prof to profile. It's difficult to demonstrate (proprietary codebase, woo), but here's some stuff anyway: 10,095 calls to Note that very little time is spent in I worked around this by only registering namespaces I need (i.e. by collecting the ones I want beforehand and passing them into every
|
@flavorjones, @unnali: yes register_namespaces is also slow, but something else is going on - please try the following example yourself:
The results (obviously no difference between ruby 1.9.3 und ruby 1.8.7 - almost the same numbers as most of the stuff is done by nokogiri: fast1: 2.12183022499084 Thats a massive speedup. Also note that NO namespace prefixes are used here, with lots of namespace prefixes the results are even better. Sorry for using old-fashioned monkey-patching and time measurement ;-) p.s.:
|
p.s.s. The problem that it is not reliable still remains (see comment 3 days ago). BUT, unless i'm doing something wrong, i think the numbers make this a worthwhile pursuit ;-). I'am looking into the reliability issue by adding #node for resetting in C - but i need some more time here. |
Okay, so in your case the slowdown is because 50,000 new
Whether this is totally ridiculous or not, I can't really comment. |
Yes. I'm talking about MRI. And the (preceived) problem still persists. And its not primarily about speedup (which could be reaped after solving this). Its about the behaviour of the following code:
This code should output 2201220, it does output 2201201. I still think this results point at a bug. |
Urgh, OK, reopening. @etm - I'm curious why no objections were raised when this ticket was closed. Honestly, I thought this had been solved, and apologize for the confusion. |
Sorry, entirely my fault, must have missed the mail notification that it was closed :-) It bugs me not that much that I check back regularely. After all, it only occurs in a code path, that is currently not used that way in nokogiri, so this is understandably not high priority. |
@etm There is definitely some weirdness going on here. Part of the issue is that libxml2 stores state in XPathContext, meaning that you shouldn't re-use it unless you know what you're doing. Calling This is also probably the behavior that's causing slowness. That aside, there's still something else going on in the JRuby port that I don't fully understand yet; I need to isolate the behavior with test cases. Wish I could be more specific at this time. |
For whatever it's worth the test case provided above now works properly on modern libxml2/nokogiri:
|
As for improving the performance of XPath queries, yes, we could re-use If someone wants to take a stab at writing something like an xpath-context-pool I'd happily review a PR. |
I've opened a new issue to drive exploring XPath performance with respect to re-using context objects: #3266 Closing this one in preference to that one. |
What a blast from the past! |
about 1/3 of the runtime of #xpath is spent allocating XPathContext. Not sure, but do you think caching the context could lead to bad situations? Caching would improve performance a lot.
Example (can be inserted into #xpath)
Note: adding and initializing @ctx and @oldns in the constructor is bad, as nodes that never need a ctx are created quite often, and thus this would degrade performance again.
cheers,
eTM
The text was updated successfully, but these errors were encountered: