Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FactoryAdapter loadDocument() modifies XMLReader and breaks it #689

Closed
stevedlawrence opened this issue Jul 21, 2023 · 4 comments
Closed

Comments

@stevedlawrence
Copy link

We have an XMLReader (via Xerces used for validation) that breaks if setEntityResolver() is called. Instead, the EntityResolver must be set by calling setProperty(..., ...) with the resolver object as the value. If setEntityResolver is called, this XMLReader sees the loaded XML as invalid and fails to parse. Note that in this XMLReader, getEntityResolver() returns null when the entity resolver is set via setProperty`.

Unfortunately, the loadDocument function in the FactoryAdapter sets an entity resolver if getEnitytResolver returns null:

https://github.com/scala/scala-xml/blob/main/shared/src/main/scala/scala/xml/parsing/FactoryAdapter.scala#L120

This causes our XMLReader to fail when used with a FactoryAdapter. It looks like this was added in commit 1cc9ea8 and moved to the FactoryAdapter in subsequent changes, so wasn't an issue prior to v2.2.0 as best I can tell.

Can setting the entity resolver (and maybe other XMLReader changes made in loadDocument) be made configurable so we can disable it if needed?

Or maybe alternatively, change the def reader in XMLLoader.scala so that it makes these changes only to the XMLReader it creates? For example

def reader: XMLReader = {
  val r = parser.getXMLReader
  r.setContentHandler(adapter)
  r.setDTDHandler(adapter)
  r.setEnityResolver(adapter)
  r.setProperty("http://xml.org/sax/properties/lexical-handler", adapter)
  ...
  r
}

This way if we override the reader to provide a custom one we avoid potentially breaking changes, and have full control over the reader, with the understanding that we may need to call some or all of the other functions ourselves to work with a FactoryAdapter.

@dubinsky
Copy link
Contributor

@stevedlawrence

We have an XMLReader (via Xerces used for validation) that breaks if setEntityResolver() is called.
Can setting the entity resolver (and maybe other XMLReader changes made in loadDocument) be made configurable so we can disable it if needed?
Or maybe alternatively, change the def reader in XMLLoader.scala so that it makes these changes only to the XMLReader it creates?

I do not think that making setting various SAX handlers on the XMLReader configurable or optional, let alone disabling it altogether and relying on the caller to figure out what methods need to be called and actually calling them is a good idea: Scala XML should be able to install the SAX handlers it needs (or, in the case of the EntityResolver, might need in the future :)).

Now that the def XMLLoader.reader: XMLReader extension point is available, it can be used to supply an XMLReader that uses the one you need but does not break when yours does: wrap org.xml.sax.helpers.XMLFilterImpl around yours and override set/getEntityResolver() as appropriate...

@stevedlawrence
Copy link
Author

It looks like XMLFilterImpl calls setEntityResolver right before parser under the hood and still causes issues:

https://github.com/openjdk/jdk/blob/master/src/java.xml/share/classes/org/xml/sax/helpers/XMLFilterImpl.java#L704

However, I was able to do as you suggest but with a custom XMLReader wraper. That seems to have solved the problem and allows things to work with v2.2.0. Thanks!

@dubinsky
Copy link
Contributor

(XMLFilterImpl.parse() can be overwritten ;))

things ... work with v2.2.0. Thanks!

Glad it worked out!

@stevedlawrence
Copy link
Author

I had originally tried that and it didn't seem to work and I didn't look too much into it. But I now realize that if parse() is overridden to remove the setupParse call, then I also need to override the set/get functions for ErrorHandler/ContentHandler/DTDHandler to pass through to the parent as well. With those overrides, the XMLFilterImpl approach also works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants