-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better JSON binary handling #459
Comments
Binary isn't different from Timestamp. We have a canonical representation for binary (Base64 string) and a canonical representation for Timestamp (RFC3339 string), and an implementation or encoding is allowed to flow either differently and expose them as native runtime types. Interop is ensured by always being able to convert via the canonical representation. It doesn't matter what format the Binary is in during transfer as long as a consuming client who expects a binary will be able to convert it, and if the consumer sees a string, it can expect for the encoding to be Base64. If a client has no opinion about the expected data type, it's probably also fine for it to treat it as a string if it arrives as a string. What you are proposing is to introduce a constructor concept for JSON as it already exists in AMQP or Protobuf wire-encoding. |
But what about |
Obviously this related to our previous discussions around #432. Based on the above, it seems like there's agreement that if both the producer and consumer know that an attribute is going to be base64ed as it is serialized then there's no interop issue, right? It's really only an issue for unknown extensions. My assumption was that consumers could treat them as strings (or raw bytes). If the consumer has no idea what this attribute means, or how it should be used, the best it can do is just pass it along in the hopes of someone else later in the pipeline will know what to do with it. At that point, it can be de-base64ed by that component. But I don't see the interop issue since w/o knowing what the value means i don't think the consumer can do anything with it anyway even if it did know it was base64ed. @jroper perhaps you can elaborate on what you think a consumer should do with an unknown extension that would require it to know whether it should de-base64 it? |
having said that... if we did actually really see an interop issue around not knowing what each attribute's type is then my current preference is to use something like @JemDay suggested, and just encode its type in the name of the attribute as it is serialized - e.g. |
The consumer will typically have an expectation for the field even if it's At the consumer side, the app will be coded to a particular contract for a context attribute. If the expectation is that the context attribute holds a binary token and the attribute's type is text, the app will try to do a Base64 conversion and that's an informed choice because that's CE's canonical encoding. With transports/encodings that can carry binary as binary, there's obviously no issue. The model we're employing for all types is that the canonical format is a string encoding and the consumer MUST have further knowledge about the expected concrete type to map into the native type they use with their runtime. If the consumer doesn't have that knowledge, it's quite safe to assume that they're not taking action on that value and a thus leaving the value as a string is just fine. For when the consumer has that knowledge, the rules we have ensure that the data is convertible if valid. |
Isn't that a contradiction? If the consumer has an expectation of the type, why is it using Here's an example case where the consumer does not have an expectation. Kafka event keys can be Strings, or bytes, or integers, or anything. So if I had an event key extension for Kafka that defined the key to be |
I believe with #484 merged we can now close this - if I'm mistake please let me know and we can reopen it. |
Currently, binary values get mapped to base64ed strings in JSON. The problem is, there's no solid way to know, when converting JSON back to a CloudEvent, whether a JSON string is a CloudEvents String type, or a CloudEvents Binary type that has been base64ed. The result is an opportunity for incompatibilities between systems, since the behaviour is unspecified, different systems may handle it in different ways (eg, some might always treat strings as strings, some might have heuristics that determine if something looks like a base64 value, etc).
Now, for known attributes that are Binary, that information can be used to handle it. But for unknown attributes (eg, extension attributes that a particular implementation does not understand but may pass along as is), there's no way to correctly handle them. Also, for Any typed attributes, there's no way to correctly handle them.
I propose therefore that we change the way binary values are encoded, such that binary values are encoded using a JSON object with a single member with a special name, eg:
Of course, this would prevent encoding any Maps or JSON values that had a key of
__ce-binary
, but the risk of that ever occurring in the real world is next to zero. If the above were adopted, then it would always be unambiguous as to whether a particular field was binary or a string. The same could be used for the data, and it would mean decoders would not have to rely on complex heuristics based on the content type to determine whether the value is meant to be binary, or text, producers could make whatever decision they want as to whether the data is binary, text or JSON, and the consumer will always be able to correctly consume it without the risk of accidentally treating a base64ed bytes as a string, or vice versa.The text was updated successfully, but these errors were encountered: