Skip to content

Commit

Permalink
handle upstream libxml2 changes (#3056)
Browse files Browse the repository at this point in the history
**What problem is this PR intended to solve?**

Upstream libxml2 test failures.

- work around a bug/merge-request filed upstream [(on master) fix:
support ASCII encoding on input buffers (!231) · Merge requests · GNOME
/ libxml2 ·
GitLab](https://gitlab.gnome.org/GNOME/libxml2/-/merge_requests/231)
- looks like in-context parser recovery now works? updating
`XML::Node#parse` behavior to accommodate it in a backwards-compatible
fashion. See #2092 which might be fixed in future versions of libxml2.


**Have you included adequate test coverage?**

Making existing tests pass (or not segfault while we wait for an
upstream fix).


**Does this change affect the behavior of either the C or the Java
implementations?**

No.
  • Loading branch information
flavorjones authored Dec 14, 2023
2 parents e710e79 + c50d60b commit 93b7b3d
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 19 deletions.
4 changes: 4 additions & 0 deletions ext/nokogiri/xml_sax_parser_context.c
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@ parse_io(VALUE klass, VALUE io, VALUE encoding)
(xmlInputReadCallback)noko_io_read,
(xmlInputCloseCallback)noko_io_close,
(void *)io, enc);
if (!ctxt) {
rb_raise(rb_eRuntimeError, "failed to create xml sax parser context");
}

if (ctxt->sax) {
xmlFree(ctxt->sax);
ctxt->sax = NULL;
Expand Down
44 changes: 25 additions & 19 deletions lib/nokogiri/xml/node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1049,29 +1049,35 @@ def parse(string_or_io, options = nil)

return Nokogiri::XML::NodeSet.new(document) if contents.empty?

# libxml2 does not obey the +recover+ option after encountering errors during +in_context+
# parsing, and so this horrible hack is here to try to emulate recovery behavior.
#
# Unfortunately, this means we're no longer parsing "in context" and so namespaces that
# would have been inherited from the context node won't be handled correctly. This hack was
# written in 2010, and I regret it, because it's silently degrading functionality in a way
# that's not easily prevented (or even detected).
#
# I think preferable behavior would be to either:
#
# a. add an error noting that we "fell back" and pointing the user to turning off the +recover+ option
# b. don't recover, but raise a sensible exception
#
# For context and background: https://github.com/sparklemotion/nokogiri/issues/313
# FIXME bug report: https://github.com/sparklemotion/nokogiri/issues/2092
error_count = document.errors.length
node_set = in_context(contents, options.to_i)
if node_set.empty? && (document.errors.length > error_count)
if options.recover?
if document.errors.length > error_count
raise document.errors[error_count] unless options.recover?

if node_set.empty?
# libxml2 < 2.13 does not obey the +recover+ option after encountering errors during
# +in_context+ parsing, and so this horrible hack is here to try to emulate recovery
# behavior.
#
# (Note that HTML4 fragment parsing seems to have been fixed in abd74186, and XML
# fragment parsing is fixed in 1c106edf. Both are in 2.13.)
#
# Unfortunately, this means we're no longer parsing "in context" and so namespaces that
# would have been inherited from the context node won't be handled correctly. This hack
# was written in 2010, and I regret it, because it's silently degrading functionality in
# a way that's not easily prevented (or even detected).
#
# I think preferable behavior would be to either:
#
# a. add an error noting that we "fell back" and pointing the user to turning off the
# +recover+ option
# b. don't recover, but raise a sensible exception
#
# For context and background:
# - https://github.com/sparklemotion/nokogiri/issues/313
# - https://github.com/sparklemotion/nokogiri/issues/2092
fragment = document.related_class("DocumentFragment").parse(contents)
node_set = fragment.children
else
raise document.errors[error_count]
end
end
node_set
Expand Down

0 comments on commit 93b7b3d

Please sign in to comment.