-
-
Notifications
You must be signed in to change notification settings - Fork 83
Configuring Woodstox II ‐ Stax2 Properties
See also:
- Configuring Woodstox I ‐ Basic Stax Properties
- Configuring Woodstox III ‐ Woodstox‐Specific Properties
As mentioned earlier, the standard Stax way of configuring anything is through factories, using setProperty(name, value)
method. This applies to Stax2 as well.
But there is also another mechanism for applying “profiles”: group of settings aimed at setting configuration defaults meant to optimize specific aspect. These methods are named as configureFor[Goal]
, for example “configureForSpeed”.
XMLInputFactory2
has following profile-configuration methods:
-
configureForConvenience: enable features that should simplify handling: enable coalescing, report all text segments as
CHARACTERS
(and notCDATA
), enableP_PRESERVE_LOCATION
-
configureForLowMemUsage: try to reduce amount of memory retained during processing by: disabling coalescing (allows parser to report smaller segments), disable
P_PRESERVE_LOCATION
-
configureForRoundTripping: try preserving event information as much as possible such that direct writes would not alter physical aspects of XML — disable coalescing, preserve distinction between
CHARACTERS
andCDATA
, disable automatic entity expansion (so entities may be written out) -
configureForSpeed: try minimizing performance overhead of options: disable coalescing, disable
P_PRESERVE_LOCATION
; enableintern()
ing of both element/attribute names and namespace URIs - configureForXmlConformance: enable features required to conform to XML 1.x specification — namespaces, DTD processing
XMLOutputFactory2
has following profile-configuration methods:
-
configureForRobustness: enable both validation and repairing options to try to ensure that output is valid, even if changes are needed (for example, in rare cases comment contents may need to be split, if caller tries to output sequence of two hyphens; or, for CDATA, two
]
characters) - configureForXmlConformance: enable all validation options to try to prevent any potential well-formedness problems (f.ex wrt namespace bindings) — but not all repairing options
- configureForSpeed: optimizes for output performance: will disable validation operations that require scanning over contents; in a way opposite of conformance/robustness profiles.
Use of profiles sets values for multiple properties (sometimes both plain Stax and Stax2 properties). But it is always possible to also set individual properties directly. Let’s have a look at what Stax2-extension properties exist and are supported by Woodstox. Note: most are Boolean
valued: I only mention type if it is something other than Boolean.
XMLInputFactory2
specifies following Stax2 properties (along with default values Woodstox uses):
-
P_AUTO_CLOSE_INPUT (default:
false
): if enabled,XMLStreamReader
will automatically close underlying input source when reader is closed; if disabled will not do so. Stax 1.0 specification mandates that the default behavior is “disabled”, often leading to unintended “dangling” input streams. -
P_DTD_OVERRIDE (default:
null
, value typeDTDValidationSchema
): property that may be set if specific DTD instance is to be used instead of whatDOCTYPE
declaration specifies (if anything).
NOTE: reading DTDValidationSchema is worth its own article, but basically entry point isXMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_DTD))
-
P_INTERN_NAMES (default:
true
): Whether element and attribute names (“local name” part) returned will beString.intern()
‘ed first or not — usually doing so saves memory and helps speed, but occasionally it may be necessary to disable this feature if number of distinct names is unbounded: for example, if names are randomly generated (like UUIDs) -
P_INTERN_NS_URIS (default:
true
): similar to above, but applies to namespace URIs. -
P_LAZY_PARSING (default:
true
): Controls whether parsing is “lazy” or “eager”: “eager” meaning that each event is completely parsed whenXMLStreamReader.next()
is called; “lazy” that only small part is parsed at that point, and rest is only parsed if and as needed. Benefits of lazy parsing included much faster skipping of unneeded content (esp. textual content, comments and processing instructions); possible downside is that sometimes error reporting may occur later than expected (during actual content access or skipping, that is, when callingnext()
for following event). -
P_PRESERVE_LOCATION (default:
true
): Controls whetherXMLStreamLocation
information is included inXMLEvent
instances or not. Disabling this feature reduces memory usage and improves processing speed modestly, but only when using “Event API” (XMLEventReader
). -
P_REPORT_CDATA (default:
true
): Whether XMLCDATA
sections are reported asCDATA
Stax event (true
) or as generalCHARACTERS
(false
) -
P_REPORT_PROLOG_WHITESPACE (default:
false
): When disabled (false
), white-space outside XML root element is skipped and not reported; only possibleCOMMENT
s andPROCESSING_INSTRUCTION
s are reported. But if enabled, additionalSPACE
events are reported — this is mostly (only) useful if trying to fully replicate document indentation outside of root element
XMLOutputFactory2
specifies following Stax2 properties:
-
P_ATTR_VALUE_ESCAPER (default:
null
, value typeEscapingWriterFactory
): By default, default escaping rules for attribute values: minimal escaping is used. It is possibly to fully customize escaping details, however. Value to assign has to be of typeEscapingWriterFactory
which contains 2 methods for constructingWriter
used for output. Typically used to extend set of characters that are to be escaped, although may be used for advanced usage such as filtering or even replacing specific contents of attribute values — for example, could be used to obfuscate certain types of ids (credit-card numbers, SSN). -
P_TEXT_ESCAPER (default:
null
, value typeEscapingWriterFactory
): similar toP_ATTR_VALUE_ESCAPER
but used for textual segments (“character data”, NOT includedCDATA
segments as they do not allow escaping). Similarly used either for changing escaping details, or for more advanced filtering/modifying textual content to output. -
P_AUTO_CLOSE_OUTPUT (default:
false
): similar toP_AUTO_CLOSE_INPUT
, determines whether underlyingOutputStream
orWriter
is automatically closed whenXMLStreamWriter
is closed — default isfalse
due to Stax 1.0 specification mandating this behavior. -
P_AUTOMATIC_EMPTY_ELEMENTS (default:
true
): When a sequence ofSTART_ELEMENT
andEND_ELEMENT
is output — with possible attributes in-between, but no child elements or textual content, it is possible to output either so-called empty element (like<element />
) or fully-written out pair (<element></element>
). If set totrue
, empty element is written; iffalse
, separate start/end tags are written. -
P_AUTOMATIC_NS_PREFIX (default:
"wstxns"
): When using “repairing: writer mode in which namespace URIs are automatically bound, namespace prefixes are generated using this String as the beginning, followed by a sequence number to keep prefixes unique.
Now that we have covered 2 out of 3 properties sets, we are almost ready to have a look at the largest set of properties: ones specific (for now) to Woodstox itself: Configuring Woodstox III ‐ Woodstox‐Specific Properties.