-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MapItem is unsafe, and MapSlice does not have a well-defined order #10
Comments
indexCounter appears to be used to populate an index field for MapItem values, and that index field appears to be used to establish an order of items in a MapSlice value. That order is stable only within a specific instance of a program; a MapSlice produced by one PID has an order that is not necessarily equal to the order of the same MapSlice produced (or received) by a different PID. So MapSlice order is not deterministic, and does not provide "a given order for maps" beyond a process boundary. |
Thank you for opening an issue. We're taking a look. |
Sorry for taking a bit to get useful communication back to you. I've had some MapSlice exists because The ultimate goal is to use JSONv2, which solves field order, other issues, and has other best practices. As currently written MapSlice is not safe for concurrent use, meaning Go Coze itself is not safe for concurrent use. Another problem is the singleton pattern of We could make it safe fairly easily by exposing something useful for I will add writing a concurrent safe MapSlice to my TODO list, as I'd like to fix this even in the interim, but I don't have time to make a concurrency safe at the moment. For now, I think these are the two actionable items that should be done now:
|
JSON maps (as well as Go maps) don't have an order of keys which can be preserved. If you want order in JSON, you gotta use an array! 😅 |
Sorry, I stole the verbiage from the "map keys" issue. MapSlice exists because JSON field order is not preserved by Go, because internally it uses a map which does not preserve order. Order is not set by JSON objects, but order is relevant for the canon aspect of Coze. You got me thinking, perhaps MapSlice is overkill for even what is needed now. MapSlice is currently only used once, in Canon(), in all of Go Coze. At the moment, I can't think of a good way replacing MapSlice in Canon() without writing a custom unmarshaler, which at that point it may be better to use JSONv2 even if it means vendoring the existing available library. |
If Coze should provide "valid and idiomatic JSON" then by definition it cannot rely on any specific ordering of map keys in the JSON encoded forms of its values. JSON does not require any specific key order, or really any deterministic encoded form at all. If order is relevant to Coze, then it must transform (unordered) JSON encoded values to (ordered) spec-defined encoded values, and operate against those transformed values. It's fine if you want to define that second thing as JSON with additional constraints, including (but not only) lexicographical order of object keys. But that transform has to be explicit. edit: One example of this issue becoming a problem is the |
Base64 isn't defined by JSON. Using base64 doesn't make Coze not JSON, just A Coze message is valid JSON. Coze needs bit perfect messages for verification, |
Coze employing (relying upon) canonicalization of encoded data definitely makes Coze not-JSON. A Coze message is valid JSON, but JSON is not necessarily valid Coze. Bit perfect messages are not, and can not, be provided by JSON. If some code accepts encoded JSON then it necessarily can not assume any specific properties of that encoded form, while remaining JSON-compatible. It must treat |
A canonicalized coze is JSON. JSON is not necessarily valid Coze, but Coze is valid JSON. |
Agreed. So if Coze claims it accepts JSON, then it cannot assume any property of canonicalized Coze which is not also a property guaranteed by JSON. That includes JSON object key ordering. If the JSON encoded bytes |
The numbers 000640 and 640 are equivalent to decimal because decimal does not It doesn't matter how a JSON library marshal a message before employing Coze to |
On the topic of MapSlice, this library actually does look like a suitable replacement for MapSlice: https://github.com/iancoleman/orderedmap I don't think it would take me too much time to implement. I would vender that repo so Coze doesn't have external dependencies. |
What is the type of a message which Coze signs? Is it |
JSON messages get "compactified" (unneeded spaces are removed) and then serialized into []byte before signing. |
OK, so then Coze does not sign objects, it signs specific serialized bytes, abiding specific rules. Those rules include "is valid JSON" but also additional rules like "has compacted whitespace" which are not guaranteed by JSON. Where are those additional rules specified? |
The Coze object details how signing works, e.g. signing is over the bytes of The Canon section covers how Coze serializes JSON. When serializing For example, the message After signing a coze, if you hit "advanced" on the verifier the bottom will show |
There is no "given order" of fields in a JSON object. If you encode a JSON object as the bytes edit: But I suspect I've not managed to convince you this is the case 😉 so I'll disengage at this point as well. |
On branch orderedmap, we've migrated from mapSlice to orderedMap, which is concurrency safe. https://github.com/Cyphrme/OrderedMap I'll likely merge it into master on Monday and tag it. |
Lines 108 to 114 in a3fbc8a
o.values is a map type, and iteration over a map is non-deterministic, so the order of the values in the v slice is also non-deterministic. But it seems like you expect a specific order in the tests: Lines 93 to 99 in a3fbc8a
If values should be returned in key order, then you probably want something like func (o *orderedMap) Values() []any {
var values []any
for _, k := range o.keys {
if v, ok := o.values[k]; ok {
values = append(values, v)
}
}
return values
} edit: Note that this example implementation does not provide stable order of values recursively, for example if a value were itself a JSON object (a map of key to value) then the order of the keys and/or values in that value would not be deterministic. |
Thank you Peter for pointing that out! We fixed that in the latest commit and tagged it: 76f019b |
Also, we just published https://github.com/Cyphrme/OrderedMap as a stand alone package. We added a thank you to you in the README. |
@peterbourgon, I don't think I mentioned before Normal: https://github.com/Cyphrme/CozeGoX/blob/master/normal/normal.go Normals, such as Canon, are useful for decoratively describing ordering of fields and other normalization for objects. That package defines 5 normals, Canon, Only, Need, Option, and Extra. In the context of Coze, Coze is only aware of a single Normal, Canon. In Coze, a Related, I understand the concern about JSON not ordering fields, so there's other more way to view this:
|
Exactly! Equivalently: the MarshalJSON method of a Coze type should produce output bytes that are valid JSON, but the UnmarshalJSON method of a Coze type can't assume the input bytes it receives are valid Coze.
Following from above, this works for encoding, but not for decoding: a Coze type can't be decoded from a JSON payload. |
Theory aside, and only considering the practical, I can't imagine a circumstance where order information is not present. Can you envision a circumstance where order information is not present? If there was the practical need for order information, transporting "Coze is JSON, JSON is not necessarily Coze" is equivalent to saying, "JSON is UTF-8. UTF-8 isn't necessarily JSON." The signature of Coze.UnmarshalJSON is []byte, which is UTF-8, which is JSON, which is Coze. The signature of UnmarshalJSON is not type UTF-8 or json.RawMessage. []byte -> UTF-8 -> JSON -> Coze |
On the practical side, the Coze specification could say something along the lines of "since JSON specifies that objects are unordered, the order for canonicalizing I believe that would satisfy the concern. |
By order information, do you mean the order of the e.g. keys in an encoded JSON payload, or something else? If it's the former, then yes, anything which handles a JSON payload is totally free to parse and re-encode that payload, as long as the change is semantically equivalent according to the JSON spec. This could be done by an HTTP middleware in the program that produces the JSON payload, a middleware in the program that receives the JSON payload, or an HTTP middlebox anywhere between those two nodes. |
Perhaps that's the root of the differing viewpoints:
I think either viewpoint has the same end result. As long as the serialized Even if potentially excessive, I think adding something like the following to |
I don't understand how UTF-8 establishes "order information" in the sense that you mean. A JSON payload may be UTF-8, but that doesn't prevent the bytes |
UTF-8 orders bytes. Out of order bytes breaks UTF-8. Coze is order aware as well as UTF-8. Coze knows what the order is from |
UTF-8 orders bytes in individual runes (characters), not entire strings. But I think that's a red herring. Coze can't assert anything about order of map keys from a received JSON payload. Receiving |
Coze isn't asserting anything about a JSON payload, but it is asserting order in a Coze payload. |
If anything downstream of JSON was strictly not allowed to be order aware, and Coze requires a UTF-8 payload in order, that is the worse case scenario that meets Coze's requirements. Since JSON isn't cryptography aware, and the downstream Coze is, this doesn't seem like a reach. Moreover, on the practical side, we've not only not run into any sort of issue with this design, but we can't think of a circumstance where this would be an issue. If there is an issue, it's only regarding design theory, not the implementation. I guess I'd also like to see a practical Go test/example where coze.pay being order aware is unacceptable. I cannot envision a circumstance where this would be the case. If there is a case, I suspect it is something resolvable. Moreover, I'd like to hear solutions to any design deficiency. Of all the available options, (which are diverse: shuffling permutations, serialization, passing along |
If you put a Coze payload into an HTTP request body, would it be Content-Type: application/json, or Content-Type: application/coze? |
That's a great thought. Most specifically, it would be Less specifically, Our API endpoints at the moment are sending application/json, but the ingesting applications know that the JSON is Coze. |
Not only the ingesting application itself, but any intermediary with access to the response. For example, if cyphr.me used a TLS-terminating CDN, then that CDN is free to transform the response body served by your cyphr.me origin before forwarding it to the requesting client, as long as the transform satisfied the JSON spec. Compression is one common way this can happen: the response body can be parsed and re-encoded in a more compact form, or gzipped, or etc. As another example, if the client application happened to use any kind of general-purpose decorator on its HTTP client, those decorators could transform the response body arbitrarily, as long as the transform satisfied the JSON spec. Logging is one common way this can happen: the response body could be parsed, individual keys are logged, and then the parsed form re-encoded and forwarded on. There are many other possibilities: WAF proxies, etc. |
The Coze signing/verification step is not JSON aware, it's just UTF-8. Non-JSON, Touching on any potential implementation concerns, It's just a few lines of To summarize previous posts and to come back to address the origin of order
If order was a practical problem, I'd expect to see issues caused by |
Coze may not be JSON aware, but since you shuttle the Coze payload thru HTTP with a Content-Type of application/json, it is by definition a JSON payload as it traverses the network. Any HTTP middle-box, including any HTTP middleware in any intermediating application, is free to parse and re-encode that JSON payload in any way which preserves JSON semantics. That means that sending This is not just a theoretical possibility, it is a common and practical reality. Tons of systems -- CDNs, in-process decorators, etc. -- do stuff like this.
AFAICT, those payloads are represented as fields in a JSON object, and not as the JSON object itself, which means the issues we're discussing here aren't applicable. If you wanted to lean on the same guarantees that JOSE/JWS/JWT/etc. rely upon, then you would need to encode the Coze UTF-8 bytes as a base64-encoded string field in a JSON object, e.g. {"coze": "<base64>"} which would be faithfully preserved over the network. |
If middleware risks re-encoding, that stack should use
I'm not sure if I understand how Coze and JOSE are different here. This is the aspect where they are the most alike. JWS has JSON payloads. https://datatracker.ietf.org/doc/html/rfc7797#section-4.2 |
Any HTTP request body that is content-type:application/json "risks re-encoding" by definition, from every intermediating node: the sending application's JSON library, any sending application middleware(s), any middle-box, any CDN proxy, any other proxy, any middleware(s) in the receiving application, the receiving application's JSON libary, etc. etc. If the specific order of bytes is significant, then the content-type cannot be application/json, it must be application/coze.
Read that RFC carefully. The relevant data is not the entire JSON object as transmit over the network, it is a specific field within the JSON object, which has a well-defined and deterministic representation. |
I don't think middleware manipulation of a coze is a concern that's significant If middleware manipulation is a concern, there are plenty of employable tools
Beyond what I've previously highlighted, although not strictly required, Coze
It is exactly the same for Coze.
I'm all ears if you can point me to this deterministic representation, but this Beyond the RFC's, this is also how the industry is using it as well. The I am unaware of a JOSE equivalent to Coze's canon functionality, although since
That is not the case for HTTPS, as an application must explicitly permit |
I'm not sure this is true. One example among many is Key.SignPay, which signs the bytes produced from calling Marshal on the Pay struct. But Marshal just JSON encodes the provided value with the standard library's encoding/json package, which isn't stable. But maybe I'm misunderstanding this whole thing, I dunno. |
It doesn't need to be. Only for signing and verification is the exact UTF-8 form is required. Coze can sign a JSON struct that's in any order, but once it's signed, the exact UTF-8 form is required. However, although not relevant here, Pay is stable as Go marshals in order of the struct if I'm not mistaken. But when verifying, order is not derived from the struct, rather the UTF-8 form. |
You keep using this term, but I'm still not sure what you mean by it. Can you explain? UTF-8 guarantees valid byte order within a single rune, but it doesn't guarantee anything about the order of runes within a string... |
UTF-8 is just a series of bytes.
How could it not? Even more primitive than UTF-8, cryptographic signatures sign byte strings. Strings are in order. If strings were not in order, the letters of the sentence would be out of order. |
UTF-8 defines a variable-length encoding for individual characters (runes). A valid UTF-8 string is guaranteed to be sequence of one or more valid UTF-8 runes, but UTF-8 doesn't guarantee anything about the order of those runes in the string. Example in Go.
Opaque byte sequences, sure, but JSON byte sequences (or payloads) aren't opaque, and aren't guaranteed to be preserved without modification between sender and receiver. Code that receives bytes designated as JSON can modify those bytes in any way that doesn't violate the JSON spec. For example the prettify middleware here is perfectly valid. |
Strings are an ordered series of bytes. Firstly, I appreciate your discourse using code. Thank you! That example highlights that
Shuffling order doesn't mean that order doesn't exist. On the contrary, the ability to shuffle is predicated on order itself. The point being that Coze is inheriting the order from the byte string which is ordered. Coze's order design decision: "just use what you're given"This was the design choice Coze faced: 1. Use strings to convey order 2. use a deterministic order (UTF-8 ordered). Coze was originally UTF-8 ordered, but we found it to be anti-erognomic since keys would be ordered rigidly. In JSON, some fields are much more important than other fields, and it's nice to have those first. Deterministic ordering is a legitimate design decision, but it's one we found to be not as minimal as simply saying "you're already sending a string, just use the given order". The more minimal design choice is typically better. Further, we wanted the ability to order keys as desired. For example, if Another legitimate option was to 3. order by Coze fields first and then UTF-8 order. Overall, we found "any order" was more aligned with JSON and more ergonomic. The coffin nail is (On a side note, are you peterGo? That account chimed in on a UTF-8/ASCII issue from a while back see issue 52062 in the Go project.) Again, thank you for demonstrating using code. I understand in that example why I think the following is better for the sake of demonstration since the code is executed: https://go.dev/play/p/oguhQ5xU8NW And it works. ;-) Yes, even when re-encoding using Here's the code: package main
import (
"bytes"
"encoding/json"
"fmt"
"log"
"github.com/cyphrme/coze"
)
func main() {
goldenCoze := `{"pay":{"msg":"Coze Rocks","alg":"ES256","iat":1623132000,"tmb":"cLj8vsYtMBwYkzoFVZHBZo6SNL8wSdCIjCKAwXNuhOk","typ":"cyphr.me/msg"},"sig":"Jl8Kt4nznAf0LGgO5yn_9HkGdY3ulvjg-NyRGzlmJzhncbTkFFn9jrwIwGoRAQYhjc88wmwFNH5u_rO56USo_w"}`
var obj map[string]json.RawMessage
if err := json.NewDecoder(bytes.NewReader([]byte(goldenCoze))).Decode(&obj); err != nil {
panic(fmt.Sprintf("decode JSON: %v", err))
}
pretty, err := json.MarshalIndent(obj, "", " ")
if err != nil {
panic(fmt.Sprintf("prettify JSON: %v", err))
}
log.Printf("pretty:\n %s", string(pretty))
var GoldenKey = coze.Key{
Alg: coze.SEAlg(coze.ES256),
Kid: "Zami's Majuscule Key.",
Iat: 1623132000,
X: coze.MustDecode("2nTOaFVm2QLxmUO_SjgyscVHBtvHEfo2rq65MvgNRjORojq39Haq9rXNxvXxwba_Xj0F5vZibJR3isBdOWbo5g"),
D: coze.MustDecode("bNstg4_H3m3SlROufwRSEgibLrBuRq9114OvdapcpVA"),
Tmb: coze.MustDecode("cLj8vsYtMBwYkzoFVZHBZo6SNL8wSdCIjCKAwXNuhOk"),
}
cz := new(coze.Coze)
err = json.Unmarshal([]byte(goldenCoze), cz)
if err != nil {
panic(err)
}
fmt.Println(GoldenKey.VerifyCoze(cz))
} |
But this fails, even though it represents 100% equivalent JSON.
UTF-8 does not define, enforce, or guarantee order beyond byte order of individual runes (characters). It explicitly does not provide any kind of order of characters (runes) in a string, or of keys in a JSON object.
Again, the order of keys as encoded by the sender — in hand-written JSON or otherwise — is not guaranteed to be preserved in the payload received by the receiver. edit: The solution here is straightforward: anywhere you pass a |
Again, order isn't relevant past verification or before signing. Once verified, which is over UTF-8, not JSON, the items may be ordered however pleased. Signing is not over JSON either; So yes, this is correct:
For your example, Coze UTF-8 strings should not be interpreted as JSON until after verification. Afterwards they may be ordered however pleased. (And yes, that's what we do on our servers. We verify coze messages first, before unmarshalling. After verification, payloads may be manipulated however seen fit.)
This is an uglier solution as it requires JSON escaping, significantly hurting readability and ballooning the message size: {
"pay":"{\"msg\":\"Coze Rocks\",\"alg\":\"ES256\",\"iat\":1623132000,\"tmb\":\"cLj8vsYtMBwYkzoFVZHBZo6SNL8wSdCIjCKAwXNuhOk\",\"typ\":\"cyphr.me\/msg\"}",
"sig":"Jl8Kt4nznAf0LGgO5yn_9HkGdY3ulvjg-NyRGzlmJzhncbTkFFn9jrwIwGoRAQYhjc88wmwFNH5u_rO56USo_w"
} |
OK, no problem, but in this case you can't use MarshalJSON/UnmarshalJSON, as those methods produce and consume JSON. You would need to define and use e.g. MarshalCoze/UnmarshalCoze. |
There is no problem using MarshalJSON/UnmarshalJSON as the example demonstrates: https://go.dev/play/p/oguhQ5xU8NW. Further, the example is idiomatic; it's not a hack. |
https://go.dev/play/p/yBrTPLzHQWN If you designate a payload as JSON, then step 2 is perfectly valid, and can occur anywhere between sender and receiver. edit: I'm not trying to be argumentative, or anything like that. I'm only trying to point out an invalid assumption in the project. At this point I think I've done about as much as I can do to point out the issue, so I'll bow out, do with my comments what you will. |
Coze signs strings. That string has an invalid signature and that example is working as specified. It's doing exactly what it should do. What invalid assumption? That Coze signs and verifies strings? |
No, that JSON serialization (as defined by the spec) produces "strings" (byte sequences) that can be signed/verified in the first place. edit: You seem to be assuming that the specific bytes that MarshalJSON produces will not be modified between sender and receiver. That's simply not true, as a statement of fact, and regardless if those bytes are encoded as UTF-8 or otherwise. |
indexCounter is accessed without synchronization, which produces data races that violate the memory model. As a proof of concept, build and run the following program with the
-race
flag.You'll see output like the following.
The text was updated successfully, but these errors were encountered: