Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JRuby nokogiri incorrectly include xml declaration for html transformation #1430

Open
jvshahid opened this issue Feb 17, 2016 · 4 comments
Open

Comments

@jvshahid
Copy link
Member

Using the following code:

input_xml = <<-EOS
<?xml version="1.0" encoding="utf-8"?>
<report>
  <title>My Report</title>
</report>
EOS

input_xsl = <<-EOS
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <html>
      <head>
        <title><xsl:value-of select="report/title"/></title>
      </head>
      <body>
        <h1><xsl:value-of select="report/title"/></h1>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>
EOS

require 'nokogiri'

xml = ::Nokogiri::XML(input_xml)
xsl = ::Nokogiri::XSLT(input_xsl)

puts xsl.apply_to(xml)

expected behavior:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>My Report</title>
</head>
<body><h1>My Report</h1></body>
</html>

actual behavior:

<?xml version="1.0" encoding="UTF-8"?><html><head><title>My Report</title></head><body><h1>My Report</h1></body></html>
@cbasguti
Copy link
Contributor

Hey everyone! Just wanted to give you a heads up that I'm actively working on this issue and putting in my best effort to find a solution.

@stevecheckoway
Copy link
Contributor

I'm not an expert on xslt, but shouldn't the method be set by an xsl:output element?

@stevecheckoway
Copy link
Contributor

To follow up on this, I think our jruby XSLT processor is incorrectly determining the default method. Here's what the standard has to say:

The default for the method attribute is chosen as follows. If

  • the root node of the result tree has an element child,

  • the expanded-name of the first element child of the root node (i.e. the document element) of the result tree has local part html (in any combination of upper and lower case) and a null namespace URI, and

  • any text nodes preceding the first element child of the root node of the result tree contain only whitespace characters,

then the default output method is html; otherwise, the default output method is xml. The default output method should be used if there are no xsl:output elements or if none of the xsl:output elements specifies a value for the method attribute.

It's not clear to me why this is failing. The UNKNOWN method's description doesn't match what the standard says, but it seems like it should still be serializing this as html.

To get the output you want, I think you just need to use an xsl:output element. I've added a line to your example:

input_xml = <<-EOS
<?xml version="1.0" encoding="utf-8"?>
<report>
  <title>My Report</title>
</report>
EOS

input_xsl = <<-EOS
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html" encoding="utf-8" />
  <xsl:template match="/">
    <html>
      <head>
        <title><xsl:value-of select="report/title"/></title>
      </head>
      <body>
        <h1><xsl:value-of select="report/title"/></h1>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>
EOS

require 'nokogiri'

xml = ::Nokogiri::XML(input_xml)
xsl = ::Nokogiri::XSLT(input_xsl)

puts xsl.apply_to(xml)

The output I get is

<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>My Report</title>
</head>
<body>
<h1>My Report</h1>
</body>
</html>
root@e350b53df4bf:/usr/src/myapp# jruby --version
jruby 9.4.3.0 (3.1.4) 2023-06-07 3086960792 OpenJDK 64-Bit Server VM 25.372-b07 on 1.8.0_372-b07 +jit [aarch64-linux]
root@e350b53df4bf:/usr/src/myapp# nokogiri --version
/usr/local/bundle/gems/nokogiri-1.15.2-java/lib/nokogiri/xml/node.rb:1007: warning: method redefined; discarding old attr
# Nokogiri (1.15.2)
    ---
    warnings: []
    nokogiri:
      version: 1.15.2
    ruby:
      version: 3.1.4
      platform: java
      gem_platform: universal-java-1.8
      description: jruby 9.4.3.0 (3.1.4) 2023-06-07 3086960792 OpenJDK 64-Bit Server VM
        25.372-b07 on 1.8.0_372-b07 +jit [aarch64-linux]
      engine: jruby
      jruby: 9.4.3.0
    other_libraries:
      isorelax:isorelax: '20030108'
      net.sf.saxon:Saxon-HE: 9.6.0-4
      net.sourceforge.htmlunit:neko-htmlunit: 2.63.0
      nu.validator:jing: 20200702VNU
      org.nokogiri:nekodtd: 0.1.11.noko2
      xalan:serializer: 2.7.3
      xalan:xalan: 2.7.3
      xerces:xercesImpl: 2.12.2
      xml-apis:xml-apis: 1.4.01

@flavorjones
Copy link
Member

I agree with @stevecheckoway's take that Xalan should be handling this case correctly, looking at the code for ToUknownStream.java.

If someone could throw this into a java debugger and tell us what's going on in Xalan that would be extremely helpful. I just spent an hour trying to get jdb to work on my system and couldn't figure it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants