-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(json-format): Clarify data encoding in JSON format with a JSON datacontenttype #861
Conversation
…tacontenttype Signed-off-by: Daniel Azuma <dazuma@google.com>
Signed-off-by: Daniel Azuma <dazuma@google.com>
One open question might be: should we include the "implied" |
Signed-off-by: Daniel Azuma <dazuma@google.com>
Signed-off-by: Daniel Azuma <dazuma@google.com>
json-format.md
Outdated
event if it is unable to translate the runtime value to a JSON value. | ||
|
||
Otherwise, if the `datacontenttype` is NOT JSON-oriented, a JSON serializer | ||
MUST use the member name `data` to store a string representation of the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it have to be a string representation? If the event data is just an integer, I'd expect that to be represented as a JSON number, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree: I think it has to be a string. Either the content type indicates JSON, in which case it's encoded as a JSON value, or the content type indicates something else, in which case by virtue of there being a content type in the first place, it implies there's a conversion between a string (or byte sequence, but we've already handled that case with data_base64
) and a runtime type.
For example, if the event data is indeed an integer, and for some reason the content type does is not JSON, but some "integer" content type (I don't know what that would be, but let's say "application/integer" for argument's sake), then that content type still implies a particular way of representing that integer in content, i.e. as a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, but we should get other people's takes on it. Aside from anything else, I'd be happy with the argument that only a string is actually useful.
One possible other case to consider, mind you: formats like YAML or protobuf, which have natural JSON object representations. I think it's at least reasonable to say "when using a JSON event format and a protobuf message, the natural data representation is the JSON for the message". I'd like that to be discussed, but for that case I think it would actually be better to put the binary representation in base64. I look forward to hearing the results of the discussion :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, if we allow non-string representations in this case, it specifically re-introduces the ambiguity that this PR is trying to solve. That is, the whole point of this PR is to distinguish when the data
member must be interpreted as string content vs when the data
member must be interpreted as a JSON value, because the overlap (when the JSON value happens to be a string) introduces ambiguity.
Let's take your example of YAML, as a format with a "natural JSON representation". Suppose we have the following event:
{
"specversion" : "1.0",
"type" : "com.example.someevent",
"source" : "/mycontext",
"id" : "B234-1234-1234",
"time" : "2018-04-05T17:31:00Z",
"datacontenttype" : "application/yaml",
"data" : "---\nhello\n"
}
Does that correspond to:
---
hello
or
---
|
---
hello
If we allow arbitrary JSON for YAML data, then we kind of have to accept the latter. That is, we're effectively adding application/yaml
(and maybe */yaml
and */*+yaml
) to the list of specially-treated content types (along with */json
and */*+json
). Which we can conceivably do, but we must do it explicitly to avoid the ambiguity. I think it's a slippery slope to start adding non-JSON formats, because it's unclear where to stop and where to draw the line of which formats "have a natural JSON representation".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should force them to base64 encode it and use data_base64 ? Then the format doesn't need to define a way to turn it into a string (and back) - we just write-out the bytes as-is (in base64).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do that for text-based data, or we lose the implicit text encoding. i.e. if the datacontenttype
is application/yaml
and we force them to binary-encode the content string, then how do we interpret the charset encoding of that data when we deserialize it? It's implicit if it's a string, because the JSON envelope itself has a particular charset, but if we make it opaque binary data, we risk losing that information.
And anyway, it seems heavy-handed to force base64 encoding of ALL non-JSON data. That would certainly break a lot of current usage in the wild.
My argument here remains. The spec states that data is "encoded into a media format which is specified by the datacontenttype
attribute" which implies that the encoded type is a string/bytearray. JSON format is making an exception to this rule if the content type is JSON, but the simplest and most natural policy would be to fall back to the general spec in all other cases.
json-format.md
Outdated
store it inside the JSON representation. The `datacontenttype` SHOULD reflect | ||
the format of the original binary data. | ||
|
||
If the type of data is NOT `Binary`, the implementation MUST next determine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While in practice I think the MUST is probably accurate, we can't actually test how they make this determination. All we can check is that it's correct on the wire, and I think the follow text covers that. So I think we could make this a "will" instead. Also, let's s/NOT/not/ just so people don't think it's normative in some way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that ("MUST" vs "will") also apply to line 138?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as indicated (s/NOT/not
and s/MUST/will
). The question about line 138 remains.
Signed-off-by: Daniel Azuma <dazuma@google.com>
Signed-off-by: Daniel Azuma <dazuma@google.com>
0267f18
to
c7f366c
Compare
lgtm and thanks for the hard work on this! @jskeet thoughts? |
I'm hoping to carve out some time before the meeting to have another look through. Thanks in advance for all the work that I can see has been going into it :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit and one area where I want to double-check we're on the same page. (I think we are.)
JSON deserializer SHOULD proceed as if it were set to `application/json`, which | ||
declares the data to contain JSON-formatted content. Thus, it SHOULD treat the | ||
`data` member directly as a [JSON value][json-value] as specified above. | ||
Furthermore, if a JSON-formatted event with no `datacontenttype` attribute, is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that for at least the C# implementation, the result of deserializing a JSON-formatted event with no datacontenttype
to a CloudEvent
object will result in an object with the DataContentType
property set to "application/json"
exactly as if it had been set explicitly. There's no easy way of saying "Well, it wasn't really set, but it was implicitly application/json, so use that if it's reserialized." My guess is that other SDKs may be in the same position. Note that this will happen even if it's reserialized with a JSON event formatter - the act of "deserialize/reserialize" won't be a no-op here. (There are other cases where it might not be a no-op too, so I don't think this is fatal.)
I'm okay with that - I just want to check that we're all on the same page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's how the Ruby SDK would behave as well. It's possible an SDK could keep the "original datacontenttype" and "effective datacontenttype" independently in the event object, but I don't think that's strictly necessary. It is important that the event semantics (i.e. the effective datacontenttype) do not change when the event is reserialized, even to a different format. It is not important that deserialize-reserialize to the same format is always a noop (although an SDK is free to try to do so anyway.) Do you think it's worth trying to state this for clarity? It seems like if we wanted to say something like that, it belongs in the main spec rather than in a format-specific document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be quite useful to be specified somewhere, but I don't mind where. I was mostly checking that I wouldn't end up with a completely different implementation to everyone else.
Signed-off-by: Daniel Azuma <dazuma@google.com>
LGTM - Thanks for pushing this through... |
Approved on 9/9 call |
Fixes #558
Fixes #860
This is an attempt to clarify the interpretation of the
data
member in JSON Format in the case of a JSON-orienteddatacontenttype
vs a non-JSON-orienteddatacontenttype
. It also explicitly documents in json-format.md the implieddatacontenttype
value should the attribute be omitted.