-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handling of datacontenttype is inconsistent #558
Comments
Trying to remember the history.... See https://github.com/cloudevents/spec/blob/master/json-format.md#31-handling-of-data In your example: I agree that when comparing the xml and json examples in the spec there appears to be a bit of an inconsistency, but (if I'm remembering correctly) we treat JSON payloads differently because they're JSON. Clearly, we can't put raw XML into JSON w/o some kind of encoding, but JSON doesn't have that problem, so it makes more sense to put the JSON into |
Exactly, that's my point -
The event format could be XML, right? So if |
@duglin I think, the important text in the section you referenced is
So, if data is already a JSON value, no translation is needed. |
Both cases, A):
and, B):
Are equivalent to the spec. The message needs to be inspected to understand how to parse. How it becomes unmarshaled is up to the consumer of the event; they need to know if it is a string, array or a struct. In the case of B, this shortens the string escaping for the consumer.
Arrays are not allowed. my bad. |
is that true? How would a JSON Object that's just a simple string but looks like JSON be serialized then if not like this: |
@n3wscott In the JSON Schema data is defined as being one of two types, object or string: "data": {
"type": ["object", "string"]
} but array type should not be allowed afaik. You can however of course: "data" : {
"foobar": [ ... ]
} |
The spec says you can promote that inner json object or use it as a string. The consumer needs to inspect the string escaping to understand if it should be a string or object based on what it is trying to un-marshal into. |
If I see: Where in the spec does it say you can alternate between a string or json? I didn't see it in: https://github.com/cloudevents/spec/blob/master/json-format.md#31-handling-of-data |
@duglin It cannot alternate. It's either-or: https://github.com/cloudevents/spec/blob/master/spec.json#L12 |
IMO having both data and data_base64 is confusing. especially when you can define datacontenttype. |
soooooo.... funny story... the spec changed at the last minute between 0.3 and 1.0 and it now is a must. A):
and, B):
Are both valid JSON but are not the same. A has been string escaped and is a JSON string. B is a Json object. |
As my original post above suggests, I also have some difficulties following the JSON Format spec and would appreciate guidance (I'm currently implementing an Elixir lib for handling CloudEvents). The core spec says that |
@kevinbader +1, I also think that if you allow "data" to be of type string, then datacontenttype+data is enough. There is no need for data_binary imo. wdyt? |
There was some discussion around this in #520. In 0.3 we still had an attribute called datacontentencoding, but we had to remove it. Instead the data_base64 field was introduced in formats where it was needed. There is no simple way to determine from a content-type, if the data is binary. Therefore an SDK will check for some well-known text-based formats and do base64 encoding otherwise. For an example of such a check, please have a look at what @alanconway wrote here. By using either |
@deissnerk thank you for the explanation! Got one more question: CE also defines the "dataschema" property as "Identifies the schema that |
@tsurdilo The Btw, I think the |
@deissnerk added contentEncoding as part of the overall json schema update - pr #563 |
What's the status of this? I think we can close it but @kevinbader do you agree? |
@duglin well, I still think that this part of the spec could be improved a lot - I'm still not sure how to implement this in the Elixir library for CloudEvents I was going to build. I mean, depending on whether I implement the optional JSON format spec, the parser behaviour for the data field is different in a non-compatible way. But I've laid out the details above already. I'm aware that fixing this would mean a breaking change to the spec. But given that 1.0 is less than five months old and therefore it likely hasn't seen that much adoption yet, perhaps it would make sense to consider a v2 spec. If this seems reasonable to you I gladly offer you to create an RFC with some ideas on how this could be handled in a more consistent manner. |
@kevinbader are you concerned about the sender or receiver of CEs? I think it’s the receiver, if so, where is the ambiguity? If it’s JSON and binary then it uses the data_base64 attribute, otherwise it uses the data attribute and the value is pure JSON not a stringified version on JSON. |
We are currently looking into using CloudEvents 1.0 in our company. I agree with @duglin that the specification is clear. Taking the example from @kevinbader:
This example above is IMHO invalid, since the value of the We human beings can guess that the string happens to be an encoded JSON. If you look at a longer example, you can't be so sure anymore:
Following the specification of CloudEvents 1.0, we interpret |
@Thoemmeli I agree to your point, but |
@kevinbader thoughts? |
@duglin well, I've implemented this according to the spec and the replies here: https://github.com/kevinbader/cloudevents-ex/blob/master/test/format/v_1_0/decoder/json_test.exs Not particularly happy with it but I guess that's how it is. |
@duglin @deissnerk This is coming up in my work on the Ruby SDK, and I want to bring up a clarification question. To summarize a conclusion from above:In the following CE:
... it sounds like the data should be considered a JSON value of type string. The fact that the string's value happens to look like serialized JSON is irrelevant. It is simply a string. Therefore, if we were to serialize this CE in HTTP Binary mode, it might look like this:
The data must be "escaped" in this way, so that a receiver parsing this content with the As a corollary, when deserializing an HTTP Binary mode CE with Taking that as given, consider this implication:Earlier a comparison was made with
If we were to treat this XML data consistently with how we treated the earlier JSON data, we would consider this data as a
However, my understanding of the spec, and my understanding of the current behavior of the SDKs, suggests we are not doing that. (And indeed I'm glad, because that would, in turn, imply that all protocol handlers would also need to understand XML.) Instead, we actually consider the above data as semantically an XML document and not a string. Hence, serializing this as HTML-Binary actually looks like:
In other words, our handling of the XML content-type appears to be inconsistent with our handling of the JSON content-type. So my clarification question is:
If so, follow-up questions:
|
I don't see any special handling of the JSON format here, @dazuma.
This statement gives you an object Like this?
The go SDK would store the serialized JSON value of As, in Javascript, you don't have byte sequences, you would perhaps instead store the data as string and base64 encode it depending on the
or maybe a bit more readable:
If your data attribute is decoded, this means that, when serializing it again, it would have to be encoded again according to |
@deissnerk I think we are in agreement here on the interpretation of this example that I labeled
So far I think we agree. My question is actually about the
I constructed For Is it: Or is it: |
@dazuma IMHO it should be: @athalhammer For the pending IANA registration we already have #557 |
I also agree. And that was my point. Parallel cases (in both cases the data is a string that looks like a serialized document in that media type), but different semantics (for JSON, the data is interpreted, and thus encoded, as a JSON string value, but for XML, the data is interpreted, and thus encoded, as a document). JSON's semantics seem to be different from any other media type. Hence my clarification questions. |
@dazuma This special handling only needs to happen in the unmarhalling implementation of the JSON format. Looking at the pending PR for XML, it seems to be similar there, as it distinguishes XML, text and binary. |
Not sure if that option has already been discussed but I think a clean solution to this could be to interpret the datacontenttype also as a flag:
There would be one edge case because a plain string actually is a JSON value. In particular, the following alone would form valid JSON document: The good news would be: this gives clear guidance without treating JSON differently. The bad news would be: the following sentences would need to be adapted in the spec:
Probably also a couple of things would break as the following would not be allowed any longer:
To mitigate this, one could specify: "if the value of the data field is not a string, the datacontenttype field is ignored" (would then not break that many things). This would leave us with the following edge case:
This would actually break - on the other hand, this is exactly also the problem that should be addressed by this thread... |
@duglin I suppose we need to discuss this in our weekly call. To mitigate the problem for |
This was thorny in the C# SDK as well. It feels like the separation of concerns isn't quite right - but I think at this point, it's too hard to fix, and I can see how the special handling of JSON is also pragmatic in terms of leading to easy-to-read events. The way we've "solved" it for C# is:
It's not ideal by any means, but it looks like it's working well enough at the moment. See the docs for implementing a protocol binding and implementing a formatter for more detail. I'm definitely not suggesting that every SDK should implement it the same way as we've chosen to, although I'm happy to work with anyone who wants to know more if they'd like to do something similar in another SDK. |
@jskeet Interestingly, it sounds like we ended up solving this very similarly for Ruby. We have a |
@dazuma: We're planning on doing something with content types in the future, but didn't want to block going GA on it :) |
@jskeet I don't think that JSON is treated special. The pending PR for XML, as well as the protobuf format have corresponding concepts. The only difference I see is, that protobuf and XML clearly distinguish data as text, binary or protobuf/XML. As I stated above, there should have been a |
@deissnerk I agree that the introduction of three fields that clearly separate json from text and base64 solves the problem more explicitly, however, it may come with additional pitfalls. Is there only one
|
@athalhammer I would recommend to handle it the same way it is done for protobuf. Only one of the |
I like your suggestion - this would also solve the problem of the Example (with JSON-LD from schema.org): ...
datacontenttype = "application/ld+json",
data_json = {
"@context": "https://schema.org",
"@type": "Person",
"address": {
"@type": "PostalAddress",
"addressLocality": "Seattle",
"addressRegion": "WA",
"postalCode": "98052",
"streetAddress": "20341 Whitworth Institute 405 N. Whitworth"
},
"colleague": [
"http://www.xyz.edu/students/alicejones.html",
"http://www.xyz.edu/students/bobsmith.html"
],
"email": "mailto:jane-doe@xyz.edu",
"image": "janedoe.jpg",
"jobTitle": "Professor",
"name": "Jane Doe",
"telephone": "(425) 123-4567",
"url": "http://www.janedoe.com"
},
... |
More detailed explanation of how the C# SDK formatters work. (We have two: one for Json.NET and one for System.Text.Json.) Here's the detail for the Json.NET formatter - the System.Text.Json one is equivalent, just using different types. Structured mode: /// In a structured mode message, any data is either binary data within the "data_base64" property value,
/// or is a JSON token as the "data" property value. Binary data is represented as a byte array.
/// A JSON token is decoded as a string if is just a string value and the data content type is specified
/// and has a media type beginning with "text/". A JSON token representing the null value always
/// leads to a null data result. In any other situation, the JSON token is preserved as a <see cref="JToken"/>
/// that can be used for further deserialization (e.g. to a specific CLR type). This behavior can be modified
/// by overriding <see cref="DecodeStructuredModeDataBase64Property(JToken, CloudEvent)"/> and
/// <see cref="DecodeStructuredModeDataProperty(JToken, CloudEvent)"/>. Binary mode: /// In a binary mode message, the data is parsed based on the content type of the message. When the content
/// type is absent or has a media type of "application/json", the data is parsed as JSON, with the result as
/// a <see cref="JToken"/> (or null if the data is empty). When the content type has a media type beginning
/// with "text/", the data is parsed as a string. In all other cases, the data is left as a byte array.
/// This behavior can be specialized by overriding <see cref="DecodeBinaryModeEventData(ReadOnlyMemory{byte}, CloudEvent)"/>. |
For the golang sdk, an integrator can register or override the handler for a given media type: https://github.com/cloudevents/sdk-go/blob/main/v2/binding/format/format.go#L67 so any other kind of media type can be supported if the integrator added the special conversion logic beoynd the well understood text, json, and xml types.
I am assuming this is only in the context of structured mode? This is the rules today with the extra case of |
Ruby SDK infoEvent class support for content-type encodingRuby's event class (i.e. in-memory representation) includes three fields:
These fields may be set in different ways depending on whether the SDK knows how to encode/decode the content for a particular event. For example, the SDK does not currently know how to parse XML, so when receiving an event with JSON structured modeWhen decoding an event in JSON structured mode, if the Similarly, when encoding an event to JSON structured mode, if the Hence the JSON structured mode implementation treats the content-type patterns Importantly, Ruby does not base any of this logic (i.e. whether a value needs to be encoded) on the runtime type of the event field value. For example, it does not assume that a dictionary implies a JSON object or that a string implies an encoded value. Rather, the encoded vs decoded status is always expressed in the
Binary modeBinary mode treats all content as encoded, and will populate the event's |
As I was asked in the regular call to provide some example to illustrate the issue. Let me try this: Assume there is a simple event forwarder that received events in structured format and forwards them in binary forward. What would happen for this event?
How would the binary look like? A
or B
If the event forwarder recognizes the ConclusionAll SDKs should have the same idea of which content types are to be interpreted as JSON. |
I have been looking into this for cloudevents/sdk-javascript. At the moment, the SDK would produce A as the result of this transformation. In the example below, I've left out the transformation from a structured event, and am just creating the event from whole cloth, since this is how it would look after deserialization anyway.
In @deissnerk's example, the event representation in A isn't actually a code representation. It's the representation on the wire. In my illustration above, the When the user ultimately sends the event with something like
That string is just a string, and nothing is wrapping it in quotes. So over the wire, there are no quotes. Which got me wondering. What if a binary event arrives and it looks like A. Is it invalid? Should the SDK wrap it in quote marks? It's not very clear. |
Regarding these two comments about
I realize this is about solving the issue in spec version 1.0 and not be a breaking change, but going beyond that, is there any discussion for the next version anywhere that would allow for breaking changes like this? I'd prefer to see a I also question the attribute naming formats for consistency. The other attributes are all lowercase, not camel nor snake, so why is This issue is still open, so I thought I would add my suggestion. I'm a bit confused by the merges as to whether this is considered fixed for spec 1.0 or not now, but I'm suggesting how I think it could be simplified for a future version anyways. ExamplesJSON as JSONIf "dataencoding": "json",
"datacontenttype": "application/json",
"data": {
value: 1
} To read this would be: JSON as text"dataencoding": "text",
"datacontenttype": "application/json",
"data": "{ \"value\": 1 }" To read this would be: XML as text"dataencoding": "text",
"datacontenttype": "application/xml",
"data": "<much wow=\"xml\"/>" To read this would be: JSON as bytes"dataencoding": "base64",
"datacontenttype": "application/json",
"data": "ew0KICAgIHZhbHVlOiAxDQp9" To read this would be: XML as bytes"dataencoding": "base64",
"datacontenttype": "application/xml",
"data": "PG11Y2ggd293PSJ4bWwiLz4=" To read this would be: Binary as bytes"dataencoding": "base64",
"datacontenttype": "image/png",
"data": "c29tZWltYWdlZGF0YQ==" To read this would be: Thank you. |
To my mind, it's as "fixed for spec 1.0" as it can reasonably be (modulo clarifying tweaks etc). Yes, there's a lot that could change for 2.0, although the larger the change in 2.0 (not just here, but everywhere), the harder it will be for SDKs etc to support both 1.0 and 2.0. I haven't heard any detailed discussions of expectations around timelines for a 2.0 - I think more of the activity is around getting Discovery etc across the line first. |
Just like @krispenner in his #558 comment I've been reading past issues to understand why 'data_base64' is chosen in favor of a 'dataencoded' atrribute. I fully agree with his proposals that the chosen solution seems confusing and inconsistent. I am curious if there is a chance his proposals will be adopted in a future release. The argument from @jskeet that big changes are not desirable is valid but maybe support both options ('data_base64' and 'dataencoded') for a while and in the longer term phase out 'data_base64'? (or are there good reasons that I missed why the 'data_base64' option is better?) |
To my mind, one problem is that "dataencoding" is a perfectly valid context attribute name, but its use here isn't really part of the CloudEvent itself. I would be happier with "data_encoding", to indicate that it's metadata about the "data" property rather than a separate context attribute. In terms of supporting both: I'd prefer not to do that, personally. We can't make this change until 2.0 (it would be a breaking change) and I'd really like to aim for 2.0 to be very, very long-lived. Instead, I think it makes sense for a CloudEvent 1.0 to use the existing format, and a CloudEvent 2.0 to use "whatever we decide is best" - individual SDKs can decide which versions of CloudEvents they support. They may decide to support both 1.0 and 2.0 forever, or drop 1.0 support after 2.0 is widely adopted. Making that an SDK choice rather than having both options in the spec itself feels like a more flexible approach. |
Adding the v2.0 label to this issue as we may want to consider tweaking the attributes at that time. |
Is the "datacontent/data_content/data_xxx" discussion the only open topic for this issue? |
CloudEvents 1.0
Consider this example, straight from the spec:
Clearly,
data
is some structure that has been encoded using the XML format and put into the event as a string (binary). Naturally, I'd assume the same behavior for JSON encoding:However, that's doesn't seem to be the case; as the example in the HTTP protocol binding spec shows, the JSON object is not sent in its encoded form but rather nested into the event directly:
Note that removing the optional
datacontenttype
attribute doesn't change this, as the spec clearly states:To sum it up, it is not possible to put a JSON-encoded data blob into a CloudEvent; and a parser needs to treat
application/json
different than any otherdatacontenttype
.HTTP Protocol Binding 1.0
For structured content mode, the spec says:
Does this mean that
datacontenttype
must be present and set to the event format? Or does structured mode implicitly change the default ofdatacontenttype
fromapplication/json
to whatever event format is in use? What ifdatacontenttype
is present and set to a different encoding - must a parser treat this event as malformed?JSON Event Format 1.0
As a side note, the JSON Format spec makes this even more confusing:
This basically says that you have to Base64-encode any simple JSON string (which is, of course, binary). Also, if a receiver does not implement the optional (!) JSON Format spec, it won't be able to parse the
data_base64
value; consequently, implementing the JSON Format spec as a sender means not implementing the full CloudEvents spec.The text was updated successfully, but these errors were encountered: