-
-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mechanism for processing invalid XML names (transforming to valid ones) #531
Conversation
First of all, thank you for contributing this PR. But as to proposed changes.. Hmmh. I really don't like the approach of magic names like this. While I understand it could be used to retain information it results in kind of special-purpose XML only processable by Jackson. I don't think I want to consider this approach; instead what I could consider is brute-force replacement of "bad" characters from Java property name to XML name used. This would result in ugly XML names but retain the structure. |
Retaining information is an absolute must-have in my use case (serializing and deserializing data).
So, basically something like my base32 proposal from #523? Or do you mean something that isn't reversible like replacing those chars with |
If you also don't like something like base32 tags, would it be OK to have the default being replacing invalid chars with |
I have refactored the PR a bit, and now there is support for multiple strategies for escaping tags:
Each strategy has its pros and cons. So instead of us choosing what tradeoffs to make, why not let the user decide? |
Anything that else that is still needed? Is it OK to include multiple strategies for escaping tag names or do you only want one that can be toggled with a flag? |
I have nothing against configuration, but I am not going to accept structure-changing modifications (adding wrapping element, additional attributes and so on). These will be unlikely to work reliably and will end up maintenance nightmares. Having said all of that I am not against one-way transformations that also happen to be reversible. This could include base64 (and similar) encoding with prefix. SInce this is not a structural transformation and can work with existing handlers it is acceptable to me. |
So, for this PR specifically, do you want me to just drop the |
src/main/java/com/fasterxml/jackson/dataformat/xml/XmlMapper.java
Outdated
Show resolved
Hide resolved
Sigh. I am not sure I will want to go with this approach at all. My main concern is complexity it adds to already fragile processing. I guess backtracking a bit, the way I would see transformations would not operate at streaming level at all, but at property handling (databind). That's where it is possible (and necessary) to use one-way transform from logical property name to physical to match. This leaves streaming parser/generator as-is without knowing anything about changes. Translating things at lower level would have some benefits but my worry is that handling is already quite convoluted. I see why reversible transformation would be necessary there. So: I don't think I will accept approach as a whole as defined here. But. I would accept extension points that allow user to do this if changes for default case are as non-intrusive as possible. This may even include out-of-the-box implementations; but also needs to allow custom implementation of converter(s). So instead of ... but not sure if this can be done without adding overhead of keeping |
src/main/java/com/fasterxml/jackson/dataformat/xml/ser/ToXmlGenerator.java
Outdated
Show resolved
Hide resolved
Fine by me, bit (I know we have different priorities here) in the end I need something where I can put in an arbitrary map convert it to XML and get the same map back out. Ideally, I would also like to avoid having to maintain a proprietary fork :)
Would that be something dataformat XML specific or would be a generic data-bind solution preferred?
I can refactor this PR into a generic extension point (I am also fine with dropping support for extra magic attributes), but could you then specify more clearly what would be acceptable and what not? Do you just want me to just map on Strings or do you want me to convert Strings to |
@mensinda So: extension I was thinking of would be XML-specific, and registered similar to what PR does but instead of pre-defined enum, with actual (stateless) handler. One tricky part is that I do not want additional conversions to/from Does this make more sense? As to databind-level approach: that could be pursued separately and would probably just allow replacing all invalid-in-QName characters with underscore (or maybe some other configurable character). I think that is not something that would work for your use case. |
This commit adds an extendable `XmlTagProcessor` that is used for escaping invalid characters in XML tag names. fixes FasterXML#523 fixes FasterXML#524
@cowtowncoder I have updated the PR to use a new extension point ( |
ping :) |
Hi there! Sorry, haven't had any time to look into this. It's on my list, hoping to get back to it in near future. |
Hi, because of the timeline "The plan is to get the first Release Candidate (2.14.0-rc1) out during August 2022" from https://cowtowncoder.medium.com/jackson-2-14-sneak-peek-79859babaa4, I was wandering how likely it would be that this PR could go into the 1.14 release? |
@mensinda I have my long (but shortening slowly) list of things to work through prior to release; this is an entry. So timing of RC1 may well move but I will have a look here before that, or at least final release (possible to have multiple RCs, even with new features). |
@mensinda Hi there! Apologies for this taking so long but I FINALLY had a chance to go back and read the PR. I will be adding some smaller notes as comments but I have only one bigger thing I'd like to change: avoiding construction of
I guess my question is whether modification of namespace URI is needed or not; if not could simply pass I know this may sound like over-optimizing but I hope this can be done since most use cases will probably not configure mutation so allocations are unnecessary overhead. |
return _tagProcessor; | ||
} | ||
|
||
public void setXmlTagProcessor(XmlTagProcessor _tagProcessor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless something in setup requires this, let's leave out this mutator: being a new feature should be possible to just use Builder approach. 3.0 (master) will have to remove it otherwise.
/** | ||
* @since 2.14 | ||
*/ | ||
public void setXmlTagProcessor(XmlTagProcessor tagProcessor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also remove this mutator; should be passed via XmlFactory
(and then avoids requiring the other setter in factory)
src/main/java/com/fasterxml/jackson/dataformat/xml/XmlTagProcessors.java
Show resolved
Hide resolved
src/main/java/com/fasterxml/jackson/dataformat/xml/XmlTagProcessors.java
Show resolved
Hide resolved
@mensinda I think I could just merge this and make change I want (wrt mutability of tag info to avoid construction of instances). But I realized there is one practical thing to do first if we haven't done it yet (apologies if we did and I just forgot): I need the CLA. It's here: https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf (there is also alternate Corporate CLA if individual one linked above doesn't work) and it's a one-time thing (good for all future contributions). I would really like to get this in the first 2.14.0-rc1 if possible! |
Thanks for the review, but I am currently on vacation. I can send you the CLA this Friday if that is enough...
On 6 September 2022 02:04:31 CEST, Tatu Saloranta ***@***.***> wrote:
***@***.*** I think I could just merge this and make change I want (wrt mutability of tag info to avoid construction of instances). But I realized there is one practical thing to do first if we haven't done it yet (apologies if we did and I just forgot): I need the CLA. It's here:
https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf
(there is also alternate Corporate CLA if individual one linked above doesn't work)
and it's a one-time thing (good for all future contributions).
The easiest way is usually to print it, fill, sign & scan/photo, then email to `info` at fasterxml dot com.
There are other possibilities (if you can't scan, modifying PDF with info + name as signature) too.
I would really like to get this in the first 2.14.0-rc1 if possible!
--
Reply to this email directly or view it on GitHub:
#531 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
@mensinda No problem, that's fine & thank you for the quick reply. It's OSS so availability expected to be variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is whether modification of namespace URI is needed or not; if not could simply pass String.
But if it is (apologies I did not read processor implementations which might answer this question), then making value class mutable and passing reused instance would avoid allocation.
I, don't see any reason for or against processing the URI. I personally don't have a use case for this (and can think of one, to be honest), but it was easy to include and someone might have a use case for it, so I included it.
Would you be OK with just dropping URI processing for now?
src/main/java/com/fasterxml/jackson/dataformat/xml/XmlTagProcessors.java
Show resolved
Hide resolved
src/main/java/com/fasterxml/jackson/dataformat/xml/XmlTagProcessors.java
Show resolved
Hide resolved
@mensinda Ah. If only single copy created per-mapper yeah no need to optimize, ignore that suggestion. As to other suggestions I guess my thinking was that if new methods were to be added, it'd be easier to have a "base" implementation (empty). But with just 2 methods maybe that's overthinking things. It does not look like this interface was likely to need expansion; and if it does, can use default method implementations for backwards compatibility. |
I think that actually my preferred choice is to make |
@mensinda I think that I can easily make the minor change wrt mutability after merging. So given that I think you sent CLA (will check that), I think we are good now. Thank you for this contribution! |
Merged, will change I also realized that |
... and I don't think this PR handles attribute names either (should have spent bit more time reading details). |
This commit introduces the
PROCESS_ESCAPED_MALFORMED_TAGS
andESCAPE_MALFORMED_TAGS
features that control whether invalidtag names will be escaped with an attribute.
fixes #523
fixes #524