Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOOX.$() doesn't support whitespace between comment and root element #190

Closed
andreasevers opened this issue May 6, 2024 · 7 comments
Closed

Comments

@andreasevers
Copy link

andreasevers commented May 6, 2024

Expected behavior and actual behavior:

The following code:

import org.joox.JOOX;
import org.junit.jupiter.api.Test;

class JOOXTest {
	@Test
	void shouldParseComments() {
		String xmlData = """
		<!-- comment -->
		<root></root>
		""";
		JOOX.$(xmlData).document();
	}
}

should correctly parse.
However, it fails with the following exception:

org.w3c.dom.DOMException: HIERARCHY_REQUEST_ERR: An attempt was made to insert a node where it is not permitted.

	at java.xml/com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.insertBefore(CoreDocumentImpl.java:439)
	at java.xml/com.sun.org.apache.xerces.internal.dom.ParentNode.internalInsertBefore(ParentNode.java:330)
	at java.xml/com.sun.org.apache.xerces.internal.dom.ParentNode.insertBefore(ParentNode.java:286)
	at java.xml/com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.insertBefore(CoreDocumentImpl.java:447)
	at java.xml/com.sun.org.apache.xerces.internal.dom.NodeImpl.appendChild(NodeImpl.java:230)
	at org.joox.JOOX.$(JOOX.java:99)
	at JOOXTest.shouldParseComments(JOOXTest.java:18)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

This is especially relevant for copyright notices at the start of a file.

Steps to reproduce the problem:

Run the test.

Versions:

  • jOOX: 2.0.0
  • Java: 17
@lukaseder
Copy link
Member

Thanks for your report. I agree this should probably work, assuming there are no forbidden content elements in the prolog (e.g. unsupported whitespace: https://www.w3.org/TR/xml/#NT-prolog).

@lukaseder
Copy link
Member

The problem is indeed whitespace. This works:

String xmlData = "<!-- comment --><root></root>";

This doesn't:

String xmlData = "<!-- comment --> <root></root>";

But this does, again:

String xmlData = "<?xml version=\"1.0\"?> <!-- comment --> <root></root>";

It's weird that the XMLDecl seems to be mandatory here. It isn't according to the specs:

prolog	   ::=   	XMLDecl? Misc* (doctypedecl Misc*)?

Let's see if jOOX can work around this.

@lukaseder
Copy link
Member

Related: #128

@lukaseder
Copy link
Member

Well, it seems that the Document node simply doesn't allow for any Text node to be appended as a child, irrespective of its content. It should check if the content matches this:

S	   ::=   	(#x20 | #x9 | #xD | #xA)+

But alas, it doesn't. I guess the only safe solution here is to remove the whitespace for now.

@lukaseder
Copy link
Member

The alternative would be to parse the Document directly, rather than parsing a DocumentFragment, when wrapping things with $(...)

@lukaseder lukaseder changed the title Leading XML comments not supported JOOX.$() doesn't support whitespace between comment and root element May 7, 2024
@lukaseder
Copy link
Member

Fixed for version 2.0.1

@andreasevers
Copy link
Author

Much appreciated Lukas!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants