-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] "in context" fragment parsing silently degrades functionality when XML errors are encountered #2092
Comments
and link to the github issue I just opened, #2092. [skip ci]
Side note: on JRuby in-context parsing seems to not work: #! /usr/bin/env ruby
require 'nokogiri'
context_xml = "<root xmlns:n='https://example.com/foo'></root>"
context_doc = Nokogiri::XML::Document.parse(context_xml)
valid_xml_fragment = "<n:a><b/></n:a>"
invalid_xml_fragment = "<n:a><b></n:a>" # note missing closing tag for `b`
# valid fragment parses fine
context_doc.root.parse(valid_xml_fragment).tap do |fragment|
fragment.to_xml # => "<n:a><b/></n:a>"
fragment.first.name # => "a"
fragment.first.namespace # => nil
end
# invalid fragment parses with errors, cannot recover, and is silently parsed out of context leading
# to namespaces not being properly referenced
context_doc.root.parse(invalid_xml_fragment).tap do |fragment|
fragment.to_xml # => "<n:a><b/></n:a>"
fragment.first.name # => "a"
fragment.first.namespace # => nil
end run with
|
This recovery behavior seems to be fixed for HTML4 documents on libxml2 master, but isn't fixed for XML documents. I've inquired about it at (on master) in-context parsing recovery: XML vs HTML (#645) · Issues · GNOME / libxml2 · GitLab |
**What problem is this PR intended to solve?** Upstream libxml2 test failures. - work around a bug/merge-request filed upstream [(on master) fix: support ASCII encoding on input buffers (!231) · Merge requests · GNOME / libxml2 · GitLab](https://gitlab.gnome.org/GNOME/libxml2/-/merge_requests/231) - looks like in-context parser recovery now works? updating `XML::Node#parse` behavior to accommodate it in a backwards-compatible fashion. See #2092 which might be fixed in future versions of libxml2. **Have you included adequate test coverage?** Making existing tests pass (or not segfault while we wait for an upstream fix). **Does this change affect the behavior of either the C or the Java implementations?** No.
The underlying behavior being worked around will be fixed in libxml 2.13 for both HTML4 and XML, so I'm not going to fix this in Nokogiri. |
Tagged for v1.17.0 since that seems to be when we'll ship libxml 2.13.x. |
This hack is no longer necessary since upstream recovery behavior has improved in v2.13. Closes #2092
This hack is no longer necessary since upstream recovery behavior has improved in v2.13. Closes #2092
This hack is not necessary with libxml 2.13 which improves fragment recovery behavior. Closes #2092
This hack is not necessary with libxml 2.13 which improves fragment recovery behavior. Closes #2092
…2.13 (#3256) **What problem is this PR intended to solve?** This hack is not necessary with libxml 2.13 which improves fragment recovery behavior. - add a TODO to remind me to remove the hack once we no longer support libxml 2.13 (system libs) - add a test that asserts the correct behavior when using libxml >= 2.13 Closes #2092 **Have you included adequate test coverage?** Yes. **Does this change affect the behavior of either the C or the Java implementations?** Sadly, the Java implementation still does not handle in-context fragment parsing correctly, but that's out of scope for this improvement.
Please describe the bug
A fragment that is parsed "in context" and contains recoverable errors is silently parsed "out of context".
The code at
nokogiri/lib/nokogiri/xml/node.rb
Lines 824 to 829 in d852d97
The root cause is that libxml2 does not pay attention to the "recover" option when parsing fragments in context via
xmlParseInNodeContext
.Help us reproduce what you're seeing
run with:
Expected behavior
I think preferable behavior would be to choose one of:
recover
optionAdditional context
The behavior described here was introduced in #313.
The text was updated successfully, but these errors were encountered: