-
-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JRuby XML::Reader memory performance is poor #2224
Comments
Hi @akimd, thanks for asking this question. I want to spend a little bit of time understanding the memory performance of this example in JRuby -- based on your description, it sounds like perhaps there's a memory leak in the JRuby implementation that we might be able to fix. The Reader class is based on libxml2's xmlreader module. Although libxml2 uses a SAX-ish pasrser at the heart of its implementation, the API is specialized, and it is optimized for the memory pattern of exposing only a "cursor" as it encounters each node. The JRuby implementation does not have a low-level parser abstraction like libxml2's I have some ideas on where the issue might be, and it's probably in the JRuby Reader wrapper. I will dig in and see if I can figure it out. In the meantime, if you are willing to take on the additional complexity of writing SAX parser handlers, you should find the memory performance of the SAX parser acceptable. |
Hi Mike, Thanks again! |
That's correct, to the best of my knowledge! If it's not doing that then we should fix it; or else I need to understand the low-level implemention of libxml2 better. |
I would love some help with this from any of the folks who are familiar with the JRuby implementation. |
Hi guys, |
For posterity: this isn't the first issue filed about the memory utilization of Reader in JRuby -- see also #1066. |
See #831 for another instance when we did work to try to improve memory usage. |
Hi,
In the context of a Rails application, I have to process huge XML documents that are "flat". I mean, they could just have been CSV documents instead of XML, but the source provides only XML.
While it appears to work well in MRI, with jruby the memory consumption is very high, and at some point the process is stuck (out of memory).
The following stupid script mimics the problem I face:
The documentation is somewhat ambiguous on how XML::Reader works. It is easy to understand "The Reader parser is good for when you need the speed of a SAX parser, but do not want to write a Document handler." as meaning "this is a SAX parser with a thin interface on top to make it easier than dealing with SAX yourself".
However the first node return by XML::Reader has the whole document as inner_xml, so I am wondering if XML::Reader is really SAX.
What we need in a document that looks like
is to iterate just on the entries. What is the recommendation in such a case?
Thanks a lot for Nokogiri
The text was updated successfully, but these errors were encountered: