Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Calling to_html on Nokogiri::XML::Attr raises Unsupported document node (2) exception #3125

Closed
marcoroth opened this issue Feb 6, 2024 · 5 comments · Fixed by #3127
Labels
state/needs-triage Inbox for non-installation-related bug reports or help requests

Comments

@marcoroth
Copy link

marcoroth commented Feb 6, 2024

Please describe the bug

Calling to_html on a Nokogiri::XML::Attr instance, parsed via HTML5::DocumentFragment is raising an Unsupported docu ment node (2); this is a bug in Nokogiri (RuntimeError) exception here.

Help us reproduce what you're seeing

#!/usr/bin/env ruby

require 'nokogiri'
require 'minitest/autorun'

class Test < Minitest::Spec
  describe "Nokogiri::XML::Attr#to_html" do
    it "should convert boolean attribute to HTML" do
      html = %(<input required>)
      fragment = Nokogiri::HTML5::DocumentFragment.parse(html)
       
      input = fragment.at_css("input")
      attribute = input.attribute("required")

      assert_equal " required", attribute.to_html
    end

    it "should convert string attribute to HTML" do
      html = %(<input name="name">)
      fragment = Nokogiri::HTML5::DocumentFragment.parse(html)
       
      input = fragment.at_css("input")
      attribute = input.attribute("name")

      assert_equal %( name="name"), attribute.to_html
    end
  end
end

Expected behavior

I expect the attribute to be converted to HTML and not raise the following error:

/Users/marcoroth/.anyenv/envs/rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/nokogiri-1.16.2-x86_64-darwin/lib/nokogiri/html5/node.rb:64:in `html_standard_serialize': Unsupported document node (2); this is a bug in Nokogiri (RuntimeError)
        from /Users/marcoroth/.anyenv/envs/rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/nokogiri-1.16.2-x86_64-darwin/lib/nokogiri/html5/node.rb:64:in `write_to'
        from /Users/marcoroth/.anyenv/envs/rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/nokogiri-1.16.2-x86_64-darwin/lib/nokogiri/xml/node.rb:1302:in `serialize'
        from /Users/marcoroth/.anyenv/envs/rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/nokogiri-1.16.2-x86_64-darwin/lib/nokogiri/xml/node.rb:1552:in `to_format'
        from /Users/marcoroth/.anyenv/envs/rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/nokogiri-1.16.2-x86_64-darwin/lib/nokogiri/xml/node.rb:1314:in `to_html'

It seems like the same call is working fine in HTML4:

irb(main)> Nokogiri::HTML4::DocumentFragment.parse(%(<input required>)).at_css("input").attribute("required").to_html
=> " required"
irb(main)> Nokogiri::HTML4::DocumentFragment.parse(%(<input name="name">)).at_css("input").attribute("name").to_html
=> " name=\"name\""

Environment

bundle exec nokogiri -v

# Nokogiri (1.16.2)
    ---
    warnings: []
    nokogiri:
      version: 1.16.2
      cppflags:
      - "-I/Users/marcoroth/.anyenv/envs/rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/nokogiri-1.16.2-x86_64-darwin/ext/nokogiri"
      - "-I/Users/marcoroth/.anyenv/envs/rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/nokogiri-1.16.2-x86_64-darwin/ext/nokogiri/include"
      - "-I/Users/marcoroth/.anyenv/envs/rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/nokogiri-1.16.2-x86_64-darwin/ext/nokogiri/include/libxml2"
      ldflags: []
    ruby:
      version: 3.3.0
      platform: x86_64-darwin23
      gem_platform: x86_64-darwin-23
      description: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin23]
      engine: ruby
    libxml:
      source: packaged
      precompiled: true
      patches:
      - 0001-Remove-script-macro-support.patch
      - 0002-Update-entities-to-remove-handling-of-ssi.patch
      - 0003-libxml2.la-is-in-top_builddir.patch
      - '0009-allow-wildcard-namespaces.patch'
      - 0010-update-config.guess-and-config.sub-for-libxml2.patch
      - 0011-rip-out-libxml2-s-libc_single_threaded-support.patch
      memory_management: ruby
      iconv_enabled: true
      compiled: 2.12.5
      loaded: 2.12.5
    libxslt:
      source: packaged
      precompiled: true
      patches:
      - 0001-update-config.guess-and-config.sub-for-libxslt.patch
      datetime_enabled: true
      compiled: 1.1.39
      loaded: 1.1.39
    other_libraries:
      zlib: '1.3'
      libiconv: '1.17'
      libgumbo: 1.0.0-nokogiri

Additional context

I found this while trying to update my parse call form HTML4 to HTML5 here with these tests:

https://github.com/marcoroth/phlexing/blob/28f7d6970b69b22c425d2ccea064ecffd8d189ba/gem/test/phlexing/converter/tags_test.rb#L219-L237

I got the failing tests in this PR: marcoroth/phlexing#288

@marcoroth marcoroth added the state/needs-triage Inbox for non-installation-related bug reports or help requests label Feb 6, 2024
@stevecheckoway
Copy link
Contributor

The HTML4 serialization is a little strange with its leading space. That seems kind of like an implementation detail of libxml2. Is that behavior we want to replicate?

@flavorjones
Copy link
Member

@stevecheckoway I don't think we want to replicate that space, at least not as part of the attribute's serialization.

@flavorjones
Copy link
Member

I've verified that serializing attributes like this has not worked since Nokogumbo was merged into Nokogiri in v1.12.0.

Will open a PR shortly.

flavorjones added a commit that referenced this issue Feb 6, 2024
Previously an exception would be raised, either "Unexpected node" or
"Unsupported document node (2)" depending on the version of Nokogiri.

Fixes #3125
flavorjones added a commit that referenced this issue Feb 6, 2024
Previously an exception would be raised, either "Unexpected node" or
"Unsupported document node (2)" depending on the version of Nokogiri.

Fixes #3125
flavorjones added a commit that referenced this issue Feb 6, 2024
Previously an exception would be raised, either "Unexpected node" or
"Unsupported document node (2)" depending on the version of Nokogiri.

Fixes #3125
flavorjones added a commit that referenced this issue Feb 6, 2024
Previously an exception would be raised, either "Unexpected node" or
"Unsupported document node (2)" depending on the version of Nokogiri.

Fixes #3125
flavorjones added a commit that referenced this issue Feb 8, 2024
**What problem is this PR intended to solve?**

Previously an exception would be raised, either "Unexpected node" or
"Unsupported document node (2)" depending on the version of Nokogiri.

Fixes #3125

**Have you included adequate test coverage?**

Yes.

**Does this change affect the behavior of either the C or the Java
implementations?**

HTML5 is only available in CRuby.
@marcoroth
Copy link
Author

Thank you for the quick fix! 🙏🏼

@flavorjones
Copy link
Member

👍 Hoping to get a release out in the next few days. Ping me if it becomes urgent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state/needs-triage Inbox for non-installation-related bug reports or help requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants