-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault when adding fragment in HTML::Builder and document has an xmlns attribute #3112
Comments
@bringel Thanks for reporting this! I've reproduced with this script: #! /usr/bin/env ruby
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "nokogiri", path: "."
end
doc = Nokogiri::HTML4::Builder.new do |doc|
doc.html(xmlns: "http://www.w3.org/1999/xhtml") {
doc.body {
doc << "asdf"
}
}
end I'll certainly fix this. Although, I'm curious why you're adding a namespace to an HTML document. ePUB is an XHTML/XML standard. Can you say more about what you're doing here? Why not use the |
Stack walkback is
|
Looks like a libxml2 bug. Doesn't reproduce with libxml 2.9.13, but does reproduce with the packaged 2.12.4. Will bisect. |
git bisect shows the first problematic commit is https://gitlab.gnome.org/GNOME/libxml2/-/commit/e0dd330b8fc299b26f7f7f1a3c853daee56a9987 This commit is first in libxml 2.12.0, which was first packaged in nokogiri v1.16.0. You should be able to use an older version of nokogiri to work around this for now. I'll file a bug report upstream and attempt to provide a patch fix. |
Hi @flavorjones, thanks for the quick work! I was adding the namespace mostly because a lot of the example content I had seen for ePub and documentation made me think that I needed to be using a stricter xhtml document than I think you really need to. Certainly seems to be okay for my testing at the moment without the namespace. I will probably wind up going to |
Fun story: I already reported this bug, for XML documents, back in Sept 2023: https://gitlab.gnome.org/GNOME/libxml2/-/issues/597 The fix is at https://gitlab.gnome.org/GNOME/libxml2/-/commit/eb69c1d39d9175779844d4460e9a6afb74a14a2d but seems like it may not be complete (since it doesn't address HTML documents). |
Upstream bug report is https://gitlab.gnome.org/GNOME/libxml2/-/issues/672 |
While we're here, a simpler reproduction that doesn't require Builder: #! /usr/bin/env ruby
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "nokogiri", path: "."
end
doc = Nokogiri::HTML5::Document.parse("<html><body><math>")
math = doc.at_css("math")
math.parse("mrow") The key bit is that we have an HTML node with a namespace, and try to parse some new markup "in the context of" that node. (In HTML5, the MathML foreign context is represented with a |
**What problem is this PR intended to solve?** Apply upstream fix for #3112 - upstream bug report https://gitlab.gnome.org/GNOME/libxml2/-/issues/672 - upstream fix https://gitlab.gnome.org/GNOME/libxml2/-/commit/95f2a17440568694a6df6a326c5b411e77597be2 Fixes #3112 **Have you included adequate test coverage?** Yes **Does this change affect the behavior of either the C or the Java implementations?** Only affects the C implementation.
- fix is in libxml 2.12.5 - also update test to not rely on the presence of the patch
- fix is in libxml 2.12.5 - also update test to not rely on the presence of the patch (cherry picked from commit f33a25f)
Please describe the bug
I'm trying to build up a document to be added to an ePub file using
Nokogiri::HTML::Builder
. When I try to add the string of html inside the body doingdoc << article[:text]
, the program crashes if I've added anxmlns
attribute to root html node, but not if I leave that off. It's not super important to me that the namespace be there, but it is very oddHelp us reproduce what you're seeing
I'm adding the article text from this page. If I create my document like this:
then the program crashes, if I leave off the
xmlns
attribute, then it works fine.Expected behavior
Environment
Additional context
ruby-2024-01-28-125803.txt
The text was updated successfully, but these errors were encountered: