-
-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jruby nokogiri 1.9.0+, rehomed elements don't keep namespaces, plus regression from 1.8 #1875
Comments
AFAIU the JRuby implementation is conforming to the standard. See my explanation inlined below.
I am not sure what is going on here. In particular, I don't remember what
This is expected.
I don't understand where
It isn't magical. The namespaces that are in scope of
|
Thanks for response! Are you suggesting that MRI and Jruby are both correct, even though they don't match? I don't understand how this can be, I thought MRI and JRuby nokogiri implementations could be expected to match. To me, the MRI implementation is the most useful, as well as seeming correct. The JRuby implementation has changed between nokogiri 1.8 and 1.9, and neither of them match MRI. (Are you suggesting that MRI ought to match... JRuby 1.9, even though it never has?) What I need to do is remove a sub-tree (a node and it's children) from a document, and make it into it's own document -- with all namespaces semantically intact. In MRI nokogiri, I can do this. In Java nokogiri 1.8, I could do it via a workaround (I considered it a workaround to a bug). In Java nokogiri 1.9+, I can find no way to do this at all. (Short of serializing to a string and re-parsing). That doesn't seem right to me.
It was, copy-pasted from the wrong example, as I was messing with different examples to try and isolate the error. (Not sure if it's more or less confusing for me to go edit my original text at this point?) |
Yes. They are equivalent, AFAIK.
Not necessarily, specially if the output XML is equivalent, such as this case.
I don't think I understand how the MRI output is more useful. I think this is worth clarifying.
Isn't that exactly what the current version of Nokogiri does ? JRuby is adding the namespace declaration to the node itself, instead of adding the namesapce declaration on the parent node. Both implementations are correct 1. [...]
Can you provide an example code to demonstrate how the JRuby's output is problematic. Is the consumer of the output assuming that |
Basically I want to write code that uses namespaces that will work under MRI or JRuby, and write tests that will pass under either. I want equivalent behavior. Which I thought was the contract of nokogiri. This includes But I see your point about the XML trees being semantically equivalent in both. (Note: semantically equivalent to a namespace-aware processor; probably not to a non-namespace-aware-processor that thinks "a:something" is just a tag name and "xmlns:a" just an attribute name.) I will spend some more with it and satisfy myself that this is true; it still wouldn't totally satisfy me, but I see your point. It seems likely to me that there are cases where the JRuby noko 1.9 version would produce a serialization that is a lot more bytes than the MRI version. If there are a lot of Which suggests another point: Let's say I have an in-memory representation that matches:
If I want to "hoist" that I don't think nokogiri gives me API to do that. The nokogiri API doesn't let you alter namespace declarations much, if at all. At least in Jruby nokogiri, not totally sure about MRI nokogiri (if they differ, that goes back to the point about MRI/nokogiri divergence). |
…ent elements than MRI, when we re-home elements into new documents for `each_record_xpath`. Test accordingly so tests pass. Documents, whether in-memory or once serialized, should still be semantically identical to a namespace-aware processor, and that's what we try to test for. See: sparklemotion/nokogiri#1875
…ent elements than MRI, when we re-home elements into new documents for `each_record_xpath`. Test accordingly so tests pass. Documents, whether in-memory or once serialized, should still be semantically identical to a namespace-aware processor, and that's what we try to test for. See: sparklemotion/nokogiri#1875 This commits passes tests in MRI nokogiri, as well as JRuby nokogiri 1.8.6 through 1.10.x
…ent elements than MRI, when we re-home elements into new documents for `each_record_xpath`. Test accordingly so tests pass. Documents, whether in-memory or once serialized, should still be semantically identical to a namespace-aware processor, and that's what we try to test for. See: sparklemotion/nokogiri#1875 This commits passes tests in MRI nokogiri, as well as JRuby nokogiri 1.8.5 through 1.10.x
Thanks for your analysis @jvshahid ! I see how the documents are semantically identical. It could be (and was!) a lot worse. I still think ideally:
However, in the current implementation, I've changed my code to test for "namespace-aware semantic equivalence", where it was failing because serializations were not identical. (In particular, if you take an existing document, and do Here is the pretty hacky method I wrote to test "namespace-aware semantic equivalence" of two different nokogiri elements, in case it's useful to anyone. def ns_semantic_equivalent_xml?(noko_a, noko_b)
noko_a = noko_a.root if noko_a.kind_of?(Nokogiri::XML::Document)
noko_b = noko_b.root if noko_b.kind_of?(Nokogiri::XML::Document)
noko_a.name == noko_b.name &&
noko_a.namespace&.prefix == noko_b.namespace&.prefix &&
noko_a.namespace&.href == noko_b.namespace&.href &&
noko_a.attributes == noko_b.attributes &&
noko_a.children.length == noko_b.children.length &&
noko_a.children.each_with_index.all? do |a_child, index|
ns_semantic_equivalent_xml?(a_child, noko_b.children[index])
end
end |
Hi, sorry for not responding before now on this thread.
Maybe in an ideal world; but the JRuby implementation is a completely separate implementation from the CRuby backend and the benefit in this case is low relative to the cost. The "correct" implementation could be argued either way, and I'm unaware of any XML spec that covers reparenting nodes into new documents. I would welcome a pull request that cleaned this up, but otherwise I will not personally prioritize it. I'm sorry your experience isn't seamless across these implementations. |
For posterity, the CRuby behavior is to create a doc that looks like: <?xml version="1.0"?>
<record xmlns="http://example.org/top" xmlns:a="http://example.org/a">
<a:something>a:something</a:something>
</record> and the JRuby behavior is: <?xml version="1.0"?>
<record xmlns="http://example.org/top">
<a:something xmlns:a="http://example.org/a">a:something</a:something>
</record> |
Thanks. If anyone can figure out any invocation to get the JRuby serialized doc to look like the CRuby one, like any methods I can call to make that happen, please do share! I would love to be able to create a doc by mutating existing documents, in Jruby, that would serialize the way the Cruby one does -- I actually used to have a workaround to do it prior to 1.9.0, but now can't figure out any way to do it at all, which is a problem. :( |
This is a followup to #1774. I'm sorry it took me a while to get back around to it.
While a fix to #1774 was provided in #1778, the fix only works for a default namespace, not one with a prefix. (It hadn't occured to me in my repro for #1774 that I should provide both, as they may be have different codepaths in java nokogiri I guess? )
What is worse, there is a regression in 1.9.0+, possibly introduced by #1778, but I haven't verified this.
I used to be able to work around #1774, by explicitly calling
add_namespace
to add the namespaces I wanted on the rehomed element.However, my workaround that worked in JRuby 1.8.5 stopped working in JRuby 1.9.0+, so I'm kind of stymied and maybe even worse off now. :(
Reproductions are all in MRI 2.5 and JRuby 9.2.0.0, but I do not believe specific MRI/JRuby versions matter.
The text was updated successfully, but these errors were encountered: