Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JRuby cache clearing issue since 1.6.2 affects Blather (Adhearsion) #1708

Closed
chewi opened this issue Jan 2, 2018 · 10 comments · Fixed by #1716
Closed

JRuby cache clearing issue since 1.6.2 affects Blather (Adhearsion) #1708

chewi opened this issue Jan 2, 2018 · 10 comments · Fixed by #1716

Comments

@chewi
Copy link

chewi commented Jan 2, 2018

The Blather project, an important part of Adhearsion, has been severely affected by a cache clearing issue in the JRuby implementation since 1.6.2, and as such has had to lock the Nokogiri version back to 1.6.1. This is obviously a security concern and we are missing out on other fixes made since.

Blather (indirectly) subclasses Nokogiri::XML::Node as Blather::XMPPNode. It's hard to explain exactly what happens as I'm not the author and it's not the simplest of libraries but somewhere along the line, node.parent is called and rather than returning the original Blather::XMPPNode instance that was created earlier, a new Nokogiri::XML::Element instance is returned instead. It may not matter that the instance is different but the change in class certainly does matter. It should go without saying that this does not happen under MRI.

I bisected the problem back to e51bb1d, made by @mbklein. I obviously had to raise an eyebrow when I saw that the following comment had since been added above the first clearCachedNote statement by @jvshahid.

// TODO: this feels kind of weird, why are we clearing the XmlNode
// cache here !!!
clearCachedNode(node);

Indeed, if I comment out this line, the problem goes away and all the Nokogiri tests still pass. Unhelpfully, if I comment out the other clearCachedNode line below, all the Nokogiri tests still pass, so perhaps there is insufficient testing here.

My limited understanding tells me that because the cache is being cleared, the original Ruby instance is lost and a new one is being created in NokogiriHelpers#constructNode with the hardcoded class of Nokogiri::XML::Element. Given that we only have the org.w3c.dom.Node instance to recreate from, Ruby-specific details such as which subclass was used originally, are lost.

It is not clear to me exactly when the cache is supposed to be cleared. If it can happen quite easily then perhaps the approach that Blather takes is no longer viable. However, given the interesting comment noted above, I suspect it is currently being cleared more than it should.

@jvshahid
Copy link
Member

jvshahid commented Jan 5, 2018

It looks like the clearCachedNode came from this pr #1054 which was trying to address #1034. I verified that this is the only change required to make the test pass. This has been the case since the pr was merged in, not a side effect of another change that has been made since then. may be @mbklein could remember why it was necessary to call clearCachedNode

@mbklein
Copy link
Contributor

mbklein commented Jan 10, 2018

Clearing cached nodes was necessary under certain circumstances because the cached instance of the node was bound to the wrong namespace. I believe I also added tests to guard against regression, so if the tests still pass without the clearCachedNode call, I won't argue for keeping it.

That said, I haven't done any work on nokogiri's Java internals since late 2014, so basing anything on my current memory and knowledge is exactly as risky as you might think. 😄

@chewi
Copy link
Author

chewi commented Jan 10, 2018

Thanks for letting us know. I can go back to that commit and check whether removing that line breaks your test when I get a minute.

@jvshahid
Copy link
Member

jvshahid commented Jan 10, 2018 via email

@chewi
Copy link
Author

chewi commented Jan 15, 2018

I can confirm that going back to e51bb1d and removing both clearCachedNode calls has no effects on the test. I also tried b6a34d9, which is the last change involving @mbklein. Only removing the nsUri = null line makes any difference. I suggest we remove the first clearCachedNode call. It seems out of place and I wonder if it was a test that was left in by accident. The second clearCachedNode call does at least look like it has a purpose.

@flavorjones
Copy link
Member

I'm not sure I understand what's being suggested for next steps. I'd be happy to merge a PR ...

chewi added a commit to yakara-ltd/nokogiri that referenced this issue Jan 29, 2018
It was previously described at "weird" and even the original author
cannot remember why it is there. Its presence causes Blather's
Blather::XMPPNode subclass instances to be recreated where they become
generic Nokogiri::XML::Element instances instead.

Closes sparklemotion#1708.
jvshahid added a commit that referenced this issue Jan 29, 2018
@jvshahid
Copy link
Member

jvshahid commented Jan 29, 2018 via email

@chewi
Copy link
Author

chewi commented Jan 29, 2018

Hmm, I already opened #1715. I only removed the first call though so you'll need to decide which one to go with.

@jvshahid
Copy link
Member

jvshahid commented Jan 29, 2018 via email

@chewi
Copy link
Author

chewi commented Jan 29, 2018

Okay, closed mine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants