-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: encoding/xml: round-trip safe mode [freeze exception] #43168
Comments
Change https://golang.org/cl/277893 mentions this issue: |
Change https://golang.org/cl/277892 mentions this issue: |
If the goal is extracting a subset of the larger message, that could be obtained with complete confidence by exposing a way to get the byte range corresponding to an element. Then you’d parse, discover the slice that contains the part you need, and slice the original doc for the new content. |
At the very least, Go needs to preserve round-trip stability while processing namespaces correctly. Mattermost’s blog post agrees with me on this: SAML requires correct namespace processing. Furthermore, at least some of the vulnerabilities are due to accepting invalid XML, which We can guarantee round-tripping of arbitrarily complex XML by having |
Re "invalid XML" -- Please be careful about terminology. XML validity is defined against schemas, and there are many legitimate use cases for processing XML which has not been validated. If the library is accepting ill-formed XML without complaint, or not detecting invalid XML when validation has been requested and a schema provided, that's a problem. I would recommend trying to avoid adding cost to the process except when actually needed. Offering the option of byte-for-byte roundtripping may make sense; making every user pay the overhead for that probably does not. Note that XML-DSig 2.0 explicitly references XML Canonicalization as a way to resolve some of the fragilities. I believe that even when XML-DSig 1.0 was announced, the community was very aware that it was stable only for strictly canonicalized documents. |
“invalid” was the terminology used in the CL that changed handling of incorrectly placed colons. I believe that XML is in fact ill-formed.
This needs to be clear in the documentation, with explicit examples that can be used for test-cases. I believe we should guarantee that if XML is serialized and then parsed, it produces the same token stream.
Can |
In SAX, at least, same token string is explicitly not guaranteed. Outputting text content, for example, can occur in chunks of whatever size is convenient for the program; reading them back in will occur in chunks of whatever sizes happen to naturally fit the parser's buffer size and the data's alignment with that. If you want token-level standardization from SAX, you need a canonicalizer layer as part of the parse operation. Similarly, a DOM parser may or may not retain the distinction between text nodes and CDATA nodes. That's a syntactic feature, not a semantic one. And while a parser typically does deliver contiguous text as a single text node, the DOM model is perfectly happy with consecutive text nodes and that detail is not retained during serialization. XML is defined in terms of the XML Data Model, not a token stream. A difference which makes no semantic difference should make no semantic difference. |
In that case, we should include canonicalization in |
What we are trying to figure out is if a subset of the encoding/xml functionality (for example, just a namespace-naive tokenizer) would be useful for SAML implementations (and other applications that need more security properties than we provide), so we can upgrade the security guarantees for that subset. If SAML and XML-DSig need a superset of not only the security properties, but also of the features of encoding/xml, then it sounds like they definitely can't and shouldn't be implemented on top of encoding/xml. The feedback I am getting from other discussion venues is that XML-DSig might be complex enough to deserve a dedicated XML tokenizer implementation. Of the three affected libraries, one is discussing deprecating SAML support (dexidp/dex#1884), and the maintainer of the other is saying that they don't have confidence in the protocol safety. If the conclusion is that a safe SAML implementation needs a new library that is either a substantial extension (as opposed to a subset) or a breaking change of encoding/xml, that should be (or at least start as) a third-party module. Implementing canonicalization and offering strict security guarantees for it is a major project, not a security fix. /cc the maintainers of some major involved downstreams, although we want to hear from anyone who has code that relies on security properties that encoding/xml does not provide: @ericchiang @russellhaering @crewjam @beevik |
Hi, I've written a successful XML parser (in Java) and am also the idiot who someone convinced (20+ years ago) that Namespaces were a good idea and edited that spec. I'm also somewhat Go-competent. I'm pretty sure that something that took a blob of bytes that claimed to be XML, puked on anything not well-formed, and got the namespace semantics right, would make SAML & DSIG happy - although I don't know their Go implementations & thus their API requirements. It seems plausible that fixing this would, per Go culture, require new APIs. BTW the same thing happened in Ruby, there was a native implementation that was severely broken and then someone wrapped one of the C parsers in a new set of APIs which is what everyone uses now. Anyhow, purpose of this note is to volunteer to help. Not sure I'm up for doing another XML parser from scratch unless someone's willing to pay for it, but I think I understand the issues. |
Just to chime in, as another Golang SAML implementor and assessor: I'm alarmed at the idea of trying to bend encoding/xml into part of a SAML/DSIG stack. It was quickly apparent to me when I implemented my SAML IdP that encoding/xml's namespacing support wasn't suitable for DSIG, which is heavily dependent on namespacing, and that its output didn't give me enough control to do canonicalized output. I've reached the conclusion that for SAML, you really want a defensively-written purpose-built minimal (as minimal as you can get with DSIG) XML; as Advent of Code is proving, I'm not an especially excellent programmer, but the XML parsing required to do SAML was just a few hundred lines of code, and it's intimately woven into DSIG c14n, which is code you have to write anyways to do DSIG. DSIG is quite probably the scariest cryptography standard in common use, scarier even than X.509, and if the Go standard library provides tools for DSIG like c14n (a source of security bugs in other systems!), the standard library maintainers own the security implications for that. This makes much more sense as a third-party package, one that doesn't have to make any compromises with the other uses XML is put to. |
@tqbf (a) love your username, (b) is your code open source? |
Sounds great to me. I'm more than somewhat Go-incompetent, but if I can find a brain cell or two to contribute...(To further support Tim's authority in this area: he's also the author of the Annotated XML Specification -- https://www.xml.com/axml/axml.html -- which is excellent documentation of exactly what the XML 1.0 spec actually means, why those decisions were made, and where there be tygres and dragonnes.)
|
Thanks for jumping in, @timbray. Can you say a little more about:
What does "got the namespace semantics right" mean to you? The current encoding/xml tokenizer tried to do this, making it so that readers would see the "actual namespace semantics" and not the raw syntax details like which shortening prefixes were being used. To take the W3C example, given I made two mistakes in the tokenizer, both fixable without new API:
The IgnoreNamespaces proposal in this issue goes much further, passing up But again, I'm interested to hear what the experts think about what "right" means. Thanks. I am also fully on board with the comments above that putting encoding/xml inside your security perimeter is a mistake, so I'm a bit reluctant to make changes that are only useful for SAML libraries. But again it seems like even for SAML, IgnoreNamespaces is a mistake. |
FWIW a few years ago I fixed a bunch of namespace issues in the encoding/xml package. There are indeed many broken things - at least there were when I last looked. The fixes were merged into the standard library but broke some tests inside Google, so the changes were reverted pending a proper proposal. If I'd ever had cause to use XML since then, I'd probably have made the effort to go through the whole thing again, but I haven't, I'm afraid :) A fork of the fixed package still lives at github.com/juju/xml if anyone is interested to dig it out again. The main motivation I originally had for the fixes was to be able to correctly round-trip XML. |
@rsc - Your description of the semantics with "bk" and urn:loc.gov:books sounds right to me. IgnoreNamespaces sounds completely crazy to me. You might not like the namespace design (lots of people don't) but you can't ignore them if an XML doc has them. I'd think that a namespace-sensitive API being used to process an XML doc should return every element & attribute name as a (namespace, name) pair, with the namespace either being the URI or nil. But I actually haven't yet understood what the bug is, they just say "instability". |
Depending on how you're definining round-tripping, the API may also want to return the prefix.And should also make sure to return namespace declaration attributes, so they appear exactly where they did in the input rather than being generated at the point where they are needed.I absolutely agree that "ignore namespaces" is a Bad Idea. The only place where it made sense was in code specifically written for backward compatibility with documents and apps that predated namespaces. (Which is also the only reason we didn't immediately deprecate the non-namespace-aware DOM APIs.) But there should be nothing which still needs that archaicism; if there is, fix it or replace it.______________________________________"Your data is important to us. Please stay on the port, and your connection will be accepted by the next available thread."
|
For the record, here are some of the CLs that fixed some of the problems that are in encoding/xml (they were all reverted later): https://go-review.googlesource.com/c/go/+/2660/ With those changes applied, namespace prefixes were not preserved on round-tripping (I think that would be very hard to do while preserving backward compatibility), but namespace semantics were. |
Anything claiming to be an XML parser should reject both. The core XML syntax says you can't have multiple attributes on an element with the same name, and syntactically XML namespace declarations are attributes. This would be an ill-formed document, and the author should be made to fix it. (https://www.w3.org/TR/xml/#uniqattspec) XML reserved the colon character in attribute and element names for use by namespaces. A qname may not start with a colon, since colon can't appear unless there is a prefix, the prefix is a NCName, and an NCName can't be empty (it must at least have the NameStartChar). (https://www.w3.org/TR/xml-names/#ns-qualnames and drill down from there.) |
We agree with you on that. We can try that for Go 1.17. (But that's not this issue.) For this issue, it sounds like the general consensus is that we should not add IgnoreNamespaces. That is, this seems like a likely decline. |
Based on the discussion above, this proposal seems like a likely decline. |
No change in consensus, so declined. |
Before this change, <:name> would parse as <name>, which could cause issues in applications that rely on the parse-encode cycle to round-trip. Similarly, <x name:=""> would parse as expected but then have the attribute dropped when serializing because its name was empty. Finally, <a:b:c> would parse and get serialized incorrectly. All these values are invalid XML, but to minimize the impact of this change, we parse them whole into Name.Local. This issue was reported by Juho Nurminen of Mattermost as it leads to round-trip mismatches. See #43168. It's not being fixed in a security release because round-trip stability is not a currently supported security property of encoding/xml, and we don't believe these fixes would be sufficient to reliably guarantee it in the future. Fixes CVE-2020-29509 Fixes CVE-2020-29511 Updates #43168 Change-Id: I68321c4d867305046f664347192948a889af3c7f Reviewed-on: https://go-review.googlesource.com/c/go/+/277892 Run-TryBot: Filippo Valsorda <filippo@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Trust: Filippo Valsorda <filippo@golang.org> Reviewed-by: Katie Hockman <katie@golang.org>
A Directive (like <!ENTITY xxx []>) can't have other nodes nested inside it (in our data structure representation), so there is no way to preserve comments. The previous behavior was to just elide them, which however might change the semantic meaning of the surrounding markup. Instead, replace them with a space which hopefully has the same semantic effect of the comment. Directives are not actually a node type in the XML spec, which instead specifies each of them separately (<!ENTITY, <!DOCTYPE, etc.), each with its own grammar. The rules for where and when the comments are allowed are not straightforward, and can't be implemented without implementing custom logic for each of the directives. Simply preserving the comments in the body of the directive would be problematic, as there can be unmatched quotes inside the comment. Whether those quotes are considered meaningful semantically or not, other parsers might disagree and interpret the output differently. This issue was reported by Juho Nurminen of Mattermost as it leads to round-trip mismatches. See #43168. It's not being fixed in a security release because round-trip stability is not a currently supported security property of encoding/xml, and we don't believe these fixes would be sufficient to reliably guarantee it in the future. Fixes CVE-2020-29510 Updates #43168 Change-Id: Icd86c75beff3e1e0689543efebdad10ed5178ce3 Reviewed-on: https://go-review.googlesource.com/c/go/+/277893 Run-TryBot: Filippo Valsorda <filippo@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Trust: Filippo Valsorda <filippo@golang.org> Reviewed-by: Katie Hockman <katie@golang.org>
I’m not sure what the right fix is for this. Malformed namespaces (leading or trailing colons, more than 1 colon) should result in an error.
I’m not sure what the right fix is for this. Malformed namespaces (leading or trailing colons, more than 1 colon) should result in an error.
All issues of golang#13400 which are not new functionalities have fixes. There are minor incompatibilities between them due to the handling of prefixes. Duplicating a prefix or an namespace is invalid XML. This is now avoided. XML produced is always valid. Tests have been added for each fix and example and previous tests fixed as output is already more compact is some cases. encoding/xml: fix duplication of namespace tags by encoder (golang#7535) A tag prefix identifies the name space of the tag and not the default name space like xmlns="...". Writing the prefix is incorrect when it is bound to a name space using the standard xmlns:prefix="..." attribute. This fix skips this print. Consequences are - duplication is avoided in line with name space standard in reference. - the _xmlns declaration does not appear anymore. To keep the previous behaviour, the prefix is printed in all other cases. Token now always produces well-formed XML except when the explicit name space collides with the prefix. Made prefix handling "xmlns="space" xmlns:_xmlns="xmlns" _..." has be removed in all wants of tests. In some cases, useless declarations like xmlns:x="x" are still added in line with previous behavior. encoding/xml: fix unexpected behavior of encoder.Indent("", "") (golang#13185, golang#11431) MarshalIndent and Marshal share code. When prefix and indent are empty, the behavior is like Marshal when it should have a minimal indent, i.e. new line as in documentation. A boolean is added to the local printer struct which defaults to false. It is set to true only when MarshalIndent is used and prefix and indent are empty. encoding/xml: fix overriding by empty namespace (golang#7113) The namespace defined by xmlns="value" can be overridden in every included tag including by the empty namespace xmlns="". Empty namespace is not authorized with a prefix (xmlns:ns=""). Method to calculate indent of XML handles depth of tag and its associated namespace. The fix leaves the method active even when no indent is required. An XMLName field in a struct means that namespace must be enforced even if empty. This occurs only on an inner tag as an overwrite of any non-empty namespace of its outer tag. To obtain the xmlns="" required, an attribute is added. encoding/xml: fix panic on embedded unexported XMLName (golang#10538) By laws of reflection, unexported fields are unreachable by .Value. XMLName are allowed at any level of an inner struct but the struct may not be reachable. If XMLName field is found without name, it cannot be exported and the value is discarded like other fields. Some comments have been to underline where the issue arises. Another XMLName test was incorrectly set up and is fixed. Various cases added in a specific test. encoding/xml: fix panic of unmarshaling of anonymous structs (golang#16497) Encoding/xml is using type "typinfo" which provides its own "value" method using the reflection package. It fails to check for the documented possible panics of the reflection methods. It is impossible to fix the method as the parameter is also using reflection. The fix is to discard anonymous structs which have no value anyway. Encoder/xml documentation already mentions that fields are accessed using the reflection package. A relevant test is added. encoding/xml: fix closing tag failure (golang#20685) Push/pop of elements name must be done using the eventual prefix together with the tag name. The operation is moved before the translation of prefix into its URI. One end element of a test is fixed as expecting the last used namespace is incorrect. After closing a tag using a namespace, the valid namespace must be taken from the opening tag. encoding/xml: add check of namespaces to detect field names conflicts (golang#8535, golang#11724) Comparing namespaces of fields was missing and false conflicts were detected. encoding/xml: fix invalid empty namespace without prefix (golang#8068) Empty namespace is allowed only if it has no prefix. An error message is now returned if a prefix exists. A similar case when no prefix is provided, thus with syntax xmlns:="..." is also rejected. encoding/xml: fix normalization of attributes values (golang#20614) The attribute value was read as text. The existing attribute reader logic is fixed as an attribute may have a namespace or only a prefix. Other possibilities have been removed. To keep the behavior of raw token which allows many faults in attributes list, error handling is heavily using the Strict parameter of the decoder. Specific tests have been added including list of attributes. To keep the behavior or unmarshal, escaped characters handling has been added but it is not symmetrical to Marshal for quotes but follows XML specification. encoding/xml: fix absence of detection of another : in qualified names (golang#20396) The occurrence of second : in space and tag name is rejected. Fixes: golang#7113, golang#7535, golang#8068, golang#8535, golang#10538, golang#11431, golang#13185, golang#16497, golang#20396, golang#20614, golang#20685 Change-Id: Ib4a60347a47d23ff59b63307cebb83b71c7c9165 Commit originally authored by: Constantin Konstantinidis <constantinkonstantinidis@gmail.com> encoding/xml: fix printing of namespace prefix in tag names Prefix displays in XML when defined in tag names. The URL of the name space is also returned for an End Token as documentation requires to improve idempotency of Marshal/Unmarshal. Translating the prefix is popping the NS which was unavailable for translation. Translate for an End Token has been moved inside the pop element part. Fixes golang#9519 Change-Id: Id7a14d07c106a76a487b5c4e28d9d563fe061c60 encoding/xml: restore changes lost in merge See golang#43168. encoding/xml: gofmt encoding/xml: minimalIndent -> minIndent to shrink diff encoding/xml: gofmt encoding/xml: revert changes to (*Decoder).attrval from master encoding/xml: restore (*printer).writeIndent to master encoding/xml: remove comments that don’t change functionality of tests
encoding/xml: disable 3/4 tests for golang#43168 I’m not sure what the right fix is for this. Malformed namespaces (leading or trailing colons, more than 1 colon) should result in an error.
Change https://golang.org/cl/355353 mentions this issue: |
All issues of golang#13400 which are not new functionalities have fixes. There are minor incompatibilities between them due to the handling of prefixes. Duplicating a prefix or an namespace is invalid XML. This is now avoided. XML produced is always valid. Tests have been added for each fix and example and previous tests fixed as output is already more compact is some cases. encoding/xml: fix duplication of namespace tags by encoder (golang#7535) A tag prefix identifies the name space of the tag and not the default name space like xmlns="...". Writing the prefix is incorrect when it is bound to a name space using the standard xmlns:prefix="..." attribute. This fix skips this print. Consequences are - duplication is avoided in line with name space standard in reference. - the _xmlns declaration does not appear anymore. To keep the previous behaviour, the prefix is printed in all other cases. Token now always produces well-formed XML except when the explicit name space collides with the prefix. Made prefix handling "xmlns="space" xmlns:_xmlns="xmlns" _..." has be removed in all wants of tests. In some cases, useless declarations like xmlns:x="x" are still added in line with previous behavior. encoding/xml: fix unexpected behavior of encoder.Indent("", "") (golang#13185, golang#11431) MarshalIndent and Marshal share code. When prefix and indent are empty, the behavior is like Marshal when it should have a minimal indent, i.e. new line as in documentation. A boolean is added to the local printer struct which defaults to false. It is set to true only when MarshalIndent is used and prefix and indent are empty. encoding/xml: fix overriding by empty namespace (golang#7113) The namespace defined by xmlns="value" can be overridden in every included tag including by the empty namespace xmlns="". Empty namespace is not authorized with a prefix (xmlns:ns=""). Method to calculate indent of XML handles depth of tag and its associated namespace. The fix leaves the method active even when no indent is required. An XMLName field in a struct means that namespace must be enforced even if empty. This occurs only on an inner tag as an overwrite of any non-empty namespace of its outer tag. To obtain the xmlns="" required, an attribute is added. encoding/xml: fix panic on embedded unexported XMLName (golang#10538) By laws of reflection, unexported fields are unreachable by .Value. XMLName are allowed at any level of an inner struct but the struct may not be reachable. If XMLName field is found without name, it cannot be exported and the value is discarded like other fields. Some comments have been to underline where the issue arises. Another XMLName test was incorrectly set up and is fixed. Various cases added in a specific test. encoding/xml: fix panic of unmarshaling of anonymous structs (golang#16497) Encoding/xml is using type "typinfo" which provides its own "value" method using the reflection package. It fails to check for the documented possible panics of the reflection methods. It is impossible to fix the method as the parameter is also using reflection. The fix is to discard anonymous structs which have no value anyway. Encoder/xml documentation already mentions that fields are accessed using the reflection package. A relevant test is added. encoding/xml: fix closing tag failure (golang#20685) Push/pop of elements name must be done using the eventual prefix together with the tag name. The operation is moved before the translation of prefix into its URI. One end element of a test is fixed as expecting the last used namespace is incorrect. After closing a tag using a namespace, the valid namespace must be taken from the opening tag. encoding/xml: add check of namespaces to detect field names conflicts (golang#8535, golang#11724) Comparing namespaces of fields was missing and false conflicts were detected. encoding/xml: fix invalid empty namespace without prefix (golang#8068) Empty namespace is allowed only if it has no prefix. An error message is now returned if a prefix exists. A similar case when no prefix is provided, thus with syntax xmlns:="..." is also rejected. encoding/xml: fix normalization of attributes values (golang#20614) The attribute value was read as text. The existing attribute reader logic is fixed as an attribute may have a namespace or only a prefix. Other possibilities have been removed. To keep the behavior of raw token which allows many faults in attributes list, error handling is heavily using the Strict parameter of the decoder. Specific tests have been added including list of attributes. To keep the behavior or unmarshal, escaped characters handling has been added but it is not symmetrical to Marshal for quotes but follows XML specification. encoding/xml: fix absence of detection of another : in qualified names (golang#20396) The occurrence of second : in space and tag name is rejected. Fixes: golang#7113, golang#7535, golang#8068, golang#8535, golang#10538, golang#11431, golang#13185, golang#16497, golang#20396, golang#20614, golang#20685 Change-Id: Ib4a60347a47d23ff59b63307cebb83b71c7c9165 Commit originally authored by: Constantin Konstantinidis <constantinkonstantinidis@gmail.com> encoding/xml: fix printing of namespace prefix in tag names Prefix displays in XML when defined in tag names. The URL of the name space is also returned for an End Token as documentation requires to improve idempotency of Marshal/Unmarshal. Translating the prefix is popping the NS which was unavailable for translation. Translate for an End Token has been moved inside the pop element part. Fixes golang#9519 Change-Id: Id7a14d07c106a76a487b5c4e28d9d563fe061c60 encoding/xml: restore changes lost in merge See golang#43168. encoding/xml: gofmt encoding/xml: minimalIndent -> minIndent to shrink diff encoding/xml: gofmt encoding/xml: revert changes to (*Decoder).attrval from master encoding/xml: restore (*printer).writeIndent to master encoding/xml: remove comments that don’t change functionality of tests encoding/xml: use t.Errorf instead of fmt.Errorf encoding/xml: fix test case for golang#11496
encoding/xml: disable 3/4 tests for golang#43168 I’m not sure what the right fix is for this. Malformed namespaces (leading or trailing colons, more than 1 colon) should result in an error.
All issues of golang#13400 which are not new functionalities have fixes. There are minor incompatibilities between them due to the handling of prefixes. Duplicating a prefix or an namespace is invalid XML. This is now avoided. XML produced is always valid. Tests have been added for each fix and example and previous tests fixed as output is already more compact is some cases. encoding/xml: fix duplication of namespace tags by encoder (golang#7535) A tag prefix identifies the name space of the tag and not the default name space like xmlns="...". Writing the prefix is incorrect when it is bound to a name space using the standard xmlns:prefix="..." attribute. This fix skips this print. Consequences are - duplication is avoided in line with name space standard in reference. - the _xmlns declaration does not appear anymore. To keep the previous behaviour, the prefix is printed in all other cases. Token now always produces well-formed XML except when the explicit name space collides with the prefix. Made prefix handling "xmlns="space" xmlns:_xmlns="xmlns" _..." has be removed in all wants of tests. In some cases, useless declarations like xmlns:x="x" are still added in line with previous behavior. encoding/xml: fix unexpected behavior of encoder.Indent("", "") (golang#13185, golang#11431) MarshalIndent and Marshal share code. When prefix and indent are empty, the behavior is like Marshal when it should have a minimal indent, i.e. new line as in documentation. A boolean is added to the local printer struct which defaults to false. It is set to true only when MarshalIndent is used and prefix and indent are empty. encoding/xml: fix overriding by empty namespace (golang#7113) The namespace defined by xmlns="value" can be overridden in every included tag including by the empty namespace xmlns="". Empty namespace is not authorized with a prefix (xmlns:ns=""). Method to calculate indent of XML handles depth of tag and its associated namespace. The fix leaves the method active even when no indent is required. An XMLName field in a struct means that namespace must be enforced even if empty. This occurs only on an inner tag as an overwrite of any non-empty namespace of its outer tag. To obtain the xmlns="" required, an attribute is added. encoding/xml: fix panic on embedded unexported XMLName (golang#10538) By laws of reflection, unexported fields are unreachable by .Value. XMLName are allowed at any level of an inner struct but the struct may not be reachable. If XMLName field is found without name, it cannot be exported and the value is discarded like other fields. Some comments have been to underline where the issue arises. Another XMLName test was incorrectly set up and is fixed. Various cases added in a specific test. encoding/xml: fix panic of unmarshaling of anonymous structs (golang#16497) Encoding/xml is using type "typinfo" which provides its own "value" method using the reflection package. It fails to check for the documented possible panics of the reflection methods. It is impossible to fix the method as the parameter is also using reflection. The fix is to discard anonymous structs which have no value anyway. Encoder/xml documentation already mentions that fields are accessed using the reflection package. A relevant test is added. encoding/xml: fix closing tag failure (golang#20685) Push/pop of elements name must be done using the eventual prefix together with the tag name. The operation is moved before the translation of prefix into its URI. One end element of a test is fixed as expecting the last used namespace is incorrect. After closing a tag using a namespace, the valid namespace must be taken from the opening tag. encoding/xml: add check of namespaces to detect field names conflicts (golang#8535, golang#11724) Comparing namespaces of fields was missing and false conflicts were detected. encoding/xml: fix invalid empty namespace without prefix (golang#8068) Empty namespace is allowed only if it has no prefix. An error message is now returned if a prefix exists. A similar case when no prefix is provided, thus with syntax xmlns:="..." is also rejected. encoding/xml: fix normalization of attributes values (golang#20614) The attribute value was read as text. The existing attribute reader logic is fixed as an attribute may have a namespace or only a prefix. Other possibilities have been removed. To keep the behavior of raw token which allows many faults in attributes list, error handling is heavily using the Strict parameter of the decoder. Specific tests have been added including list of attributes. To keep the behavior or unmarshal, escaped characters handling has been added but it is not symmetrical to Marshal for quotes but follows XML specification. encoding/xml: fix absence of detection of another : in qualified names (golang#20396) The occurrence of second : in space and tag name is rejected. Fixes: golang#7113, golang#7535, golang#8068, golang#8535, golang#10538, golang#11431, golang#13185, golang#16497, golang#20396, golang#20614, golang#20685 Change-Id: Ib4a60347a47d23ff59b63307cebb83b71c7c9165 Commit originally authored by: Constantin Konstantinidis <constantinkonstantinidis@gmail.com> encoding/xml: fix printing of namespace prefix in tag names Prefix displays in XML when defined in tag names. The URL of the name space is also returned for an End Token as documentation requires to improve idempotency of Marshal/Unmarshal. Translating the prefix is popping the NS which was unavailable for translation. Translate for an End Token has been moved inside the pop element part. Fixes golang#9519 Change-Id: Id7a14d07c106a76a487b5c4e28d9d563fe061c60 encoding/xml: restore changes lost in merge See golang#43168. encoding/xml: gofmt encoding/xml: minimalIndent -> minIndent to shrink diff encoding/xml: gofmt encoding/xml: revert changes to (*Decoder).attrval from master encoding/xml: restore (*printer).writeIndent to master encoding/xml: remove comments that don’t change functionality of tests encoding/xml: use t.Errorf instead of fmt.Errorf encoding/xml: fix test case for golang#11496 encoding/xml: revert unrelated change in (*Decoder).readName encoding/xml: remove comments
encoding/xml: disable 3/4 tests for golang#43168 I’m not sure what the right fix is for this. Malformed namespaces (leading or trailing colons, more than 1 colon) should result in an error.
All issues of golang#13400 which are not new functionalities have fixes. There are minor incompatibilities between them due to the handling of prefixes. Duplicating a prefix or an namespace is invalid XML. This is now avoided. XML produced is always valid. Tests have been added for each fix and example and previous tests fixed as output is already more compact is some cases. encoding/xml: fix duplication of namespace tags by encoder (golang#7535) A tag prefix identifies the name space of the tag and not the default name space like xmlns="...". Writing the prefix is incorrect when it is bound to a name space using the standard xmlns:prefix="..." attribute. This fix skips this print. Consequences are - duplication is avoided in line with name space standard in reference. - the _xmlns declaration does not appear anymore. To keep the previous behaviour, the prefix is printed in all other cases. Token now always produces well-formed XML except when the explicit name space collides with the prefix. Made prefix handling "xmlns="space" xmlns:_xmlns="xmlns" _..." has be removed in all wants of tests. In some cases, useless declarations like xmlns:x="x" are still added in line with previous behavior. encoding/xml: fix unexpected behavior of encoder.Indent("", "") (golang#13185, golang#11431) MarshalIndent and Marshal share code. When prefix and indent are empty, the behavior is like Marshal when it should have a minimal indent, i.e. new line as in documentation. A boolean is added to the local printer struct which defaults to false. It is set to true only when MarshalIndent is used and prefix and indent are empty. encoding/xml: fix overriding by empty namespace (golang#7113) The namespace defined by xmlns="value" can be overridden in every included tag including by the empty namespace xmlns="". Empty namespace is not authorized with a prefix (xmlns:ns=""). Method to calculate indent of XML handles depth of tag and its associated namespace. The fix leaves the method active even when no indent is required. An XMLName field in a struct means that namespace must be enforced even if empty. This occurs only on an inner tag as an overwrite of any non-empty namespace of its outer tag. To obtain the xmlns="" required, an attribute is added. encoding/xml: fix panic on embedded unexported XMLName (golang#10538) By laws of reflection, unexported fields are unreachable by .Value. XMLName are allowed at any level of an inner struct but the struct may not be reachable. If XMLName field is found without name, it cannot be exported and the value is discarded like other fields. Some comments have been to underline where the issue arises. Another XMLName test was incorrectly set up and is fixed. Various cases added in a specific test. encoding/xml: fix panic of unmarshaling of anonymous structs (golang#16497) Encoding/xml is using type "typinfo" which provides its own "value" method using the reflection package. It fails to check for the documented possible panics of the reflection methods. It is impossible to fix the method as the parameter is also using reflection. The fix is to discard anonymous structs which have no value anyway. Encoder/xml documentation already mentions that fields are accessed using the reflection package. A relevant test is added. encoding/xml: fix closing tag failure (golang#20685) Push/pop of elements name must be done using the eventual prefix together with the tag name. The operation is moved before the translation of prefix into its URI. One end element of a test is fixed as expecting the last used namespace is incorrect. After closing a tag using a namespace, the valid namespace must be taken from the opening tag. encoding/xml: add check of namespaces to detect field names conflicts (golang#8535, golang#11724) Comparing namespaces of fields was missing and false conflicts were detected. encoding/xml: fix invalid empty namespace without prefix (golang#8068) Empty namespace is allowed only if it has no prefix. An error message is now returned if a prefix exists. A similar case when no prefix is provided, thus with syntax xmlns:="..." is also rejected. encoding/xml: fix normalization of attributes values (golang#20614) The attribute value was read as text. The existing attribute reader logic is fixed as an attribute may have a namespace or only a prefix. Other possibilities have been removed. To keep the behavior of raw token which allows many faults in attributes list, error handling is heavily using the Strict parameter of the decoder. Specific tests have been added including list of attributes. To keep the behavior or unmarshal, escaped characters handling has been added but it is not symmetrical to Marshal for quotes but follows XML specification. encoding/xml: fix absence of detection of another : in qualified names (golang#20396) The occurrence of second : in space and tag name is rejected. Fixes: golang#7113, golang#7535, golang#8068, golang#8535, golang#10538, golang#11431, golang#13185, golang#16497, golang#20396, golang#20614, golang#20685 Change-Id: Ib4a60347a47d23ff59b63307cebb83b71c7c9165 Commit originally authored by: Constantin Konstantinidis <constantinkonstantinidis@gmail.com> encoding/xml: fix printing of namespace prefix in tag names Prefix displays in XML when defined in tag names. The URL of the name space is also returned for an End Token as documentation requires to improve idempotency of Marshal/Unmarshal. Translating the prefix is popping the NS which was unavailable for translation. Translate for an End Token has been moved inside the pop element part. Fixes golang#9519 Change-Id: Id7a14d07c106a76a487b5c4e28d9d563fe061c60 encoding/xml: restore changes lost in merge See golang#43168. encoding/xml: gofmt encoding/xml: minimalIndent -> minIndent to shrink diff encoding/xml: gofmt encoding/xml: revert changes to (*Decoder).attrval from master encoding/xml: restore (*printer).writeIndent to master encoding/xml: remove comments that don’t change functionality of tests encoding/xml: use t.Errorf instead of fmt.Errorf encoding/xml: fix test case for golang#11496 encoding/xml: revert unrelated change in (*Decoder).readName encoding/xml: remove comments encoding/xml: remove comments encoding/xml: edit comments
encoding/xml: disable 3/4 tests for golang#43168 I’m not sure what the right fix is for this. Malformed namespaces (leading or trailing colons, more than 1 colon) should result in an error.
Background
Juho Nurminen of Mattermost reported multiple parsing issues in encoding/xml that lead to round-trip mismatches: parsing the output of encoding a list of tokens does not produce the same list of tokens.
Some of those issues are fixed by CLs linked to this issue, but encoding/xml does not provide such a security property, was not designed to, and we don’t believe that fixing the issues we know about is sufficient to guarantee it (nor that the round-trip guarantee is necessarily sufficient for all applications).
Indeed, the investigation kept surfacing different issues impacting round-trip stability, and we are aware of at least one variant affecting the Token API that is neither fixed by an open CL nor detected by the researchers’ validator (which we are told only aims to protect RawToken uses).
Unfortunately, XML-DSig and SAML implementations came to rely on round-trip stability, if not on perfect spec alignment, for their security. This is not entirely surprising, because XML-DSig and SAML are fundamentally fragile designs, but they are also critical enough in the ecosystem that we should try to find a way to let them operate safely.
Regardless of the outcome of this proposal, applications should avoid relying on complex parsers agreeing on how to interpret inputs for their security when possible, and should minimize transfers and modifications between the time of data validation and the time of use.
We’d like to thank Mattermost for reporting these issues, for working with us through the investigation, and for coordinating the disclosure of the issues to the downstream libraries so they could mitigate in advance.
If you use an affected SAML library, please refer to the following security advisories:
Proposal
Hopefully, applications need this kind of security property only when doing limited modifications of a document (for example, extracting a part to forward). If that’s the case, we can try to find a limited subset of the encoding/xml functionality for which we can effectively test that the outputs parse unambiguously.
(We tried fuzzing the full featureset, but the fuzzer found a lot of round-trip changes which are semantically correct, but hard to validate automatically, and changing those behaviors would probably be a breaking change.)
As a first suggestion, I propose adding an
IgnoreNamespaces
field toDecoder
, which when set disables all namespace processing. Names with colons are not split and are copied whole intoName.Local
.We have a prototype of this and it survived extensive fuzzing (with the only caveat that CDATA sections end up merged with surrounding text, as the encoding/xml data structures don’t preserve escaping information).
Open questions
Is this API actually usable by the applications that need it?
Is round-trip stability sufficient, or do all those applications actually need perfect alignment with the spec on how the output is parsed? In other words, if the output is generated by Go but parsed by libxml and they disagree on the list of tokens, is that an issue? If yes, this is a much harder property to provide. HTTP/1 parsers have been struggling with it for decades, as shown by the constant stream of request smuggling vulnerabilities across the ecosystem, and HTTP/1 is arguably a much simpler protocol.
The text was updated successfully, but these errors were encountered: