-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding & Transport Independent Interoperability Guarantee (JSON-based Size Measurement) #364
Encoding & Transport Independent Interoperability Guarantee (JSON-based Size Measurement) #364
Conversation
196ef06
to
37ea87e
Compare
spec.md
Outdated
|
||
* The CloudEvent serialized as JSON (minified, i.e. without white-space) MUST | ||
NOT exceed a size of 128KB. | ||
* The CloudEvent MUST NOT have more than 100 top-level attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mainly to account for HTTP header count limits, right? If so, there are multiple problems:
- we should leave some room for normal headers (10-20?) that will be there in addition to CloudEvents and extension attributes. Things like
cookie
,accept-encoding
etc. - as detailed in HTTP binary mode header mapping and header count limits #367, with the current HTTP transport binding this limit is leaky as all "application-property" (or whatever) attributes within Maps would also be flattened to their own header
Also, if specifying limits on such basis as HTTP headers, then it would make just as much sense to separate a <8kb size limit for metadata attributes instead of having a total size limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, an overall limit for metadata attributes make more sense for HTTP headers. Somehow it slipped my mind that we flatten maps into multiple HTTP headers, thanks a lot for bringing this up!
Besides HTTP, some message queues also limit the amount of non-data attributes they allow. E.g. Google Pub/Sub limits it to 100 attributes, AWS SNS to only 10.
I think compatibility with existing implementations is one concern, the other one is what limits are preferred by producers and consumers. I'd especially like input by those that have experience writing messaging middleware.
Speaking as a producer, a limit like 100 top-level attributes
is (at least conceptually) easier to ensure than 8KB of metadata
if that metadata includes user-generated strings. When I opened up the PR, I included a non-KB limit to open up everyone's mind that there are different types of limits possible. I'm happy to change it, though!
Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
37ea87e
to
7c5d7e5
Compare
Accepting the idea that we should have some kind of size limitations do we need to explicitly describe ways to deal with larger payloads ? A Claim Check pattern might be something we could support as a first-class-citizen in the specification - such a change might sit quite nicely beside #363 with the addition of a dataref property... thoughts. |
As a producer, limiting the size will reduce the utility of CloudEvents for our organization. In use cases such as a security CloudEvent (ie. visual doorbell) it would be more elegant to have an image in the data of the CloudEvent itself. Following on JemDay's suggestion in a different way, perhaps leave it up to the consuming application to implement the claim check pattern for any data that is too large to pass through its subsystems. |
@JemDay We (commercetools) have implemented the Claim Check pattern when our messages are above the limit a given Message Queue is willing to accept. I'd be happy to work on a proposal to add it natively to the CloudEvents spec. That said, I can't tell if this is a problem only a few will face, and would clutter the size of the spec without much value, or if the problem is more widespread. I haven't seen much implementation around it (an exception is https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-s3-messages.html - but it was only implemented in their Java SDK, not in the others). I personally think being able to abstract it away at SDK level is great for producers and consumers alike. I'd be happy to hear what others think! @TamTPham I agree that it would be more elegant and easier for a producer to not have to worry about the payload size. However, I don't think this is the case - IMO the choice we can make is between:
If you pick any technology in the messaging space, you'll likely find that it limits the payload size, for examples please see #257 This is especially true for cloud-native solutions that are multi-tenant, because the cloud vendor is interested in being able to guarantee the performance. The best practice to dealing with e.g. images is, as far as I know, to upload them to a blob storage, and then pass a reference in your event. |
A few points up for discussion:
A consensus on these points would help me move this PR along, before discussing what the actual size(s) should be. |
Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
44821c6
to
f40f128
Compare
Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
In last weeks discussion, we settled on guarantees for a minimum size (versus a strict limit). Open points:
I think we should also add a few events that one can use to test if an implementation fully supports CloudEvents. Those events should test edge cases, including the minimum size. I think I'd also like to do that in a follow-up PR, if no one else wants to take that. |
Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
We had a discussion on the call whether we should say That is why we should IMHO have a measurement for the size of the event, and not of the messages that are sent over the wire. @duglin does that make sense to you? |
If we add limits then yes I think it makes sense to focus on just the size of the CE information. |
… some examples Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
e586f2a
to
a58c45d
Compare
I've tried to clarify this with an extra paragraph. Anyone, feel free to have a look before this weeks call! |
In yesterdays call, we discussed that a too-high size may exclude some IoT/Edge devices.
After sleeping about it, I'd like to propose another option:
Basically, the rule applies to anything running in a datacenter/the cloud, which I suspect will be the major application of CloudEvents, but it leaves the door open for IoT use cases. As for the size itself, I'm fine with lowering it to 128KB or 64KB. I'm more interested in having a concrete value that can be worked with than in the value itself :) |
I support the third option proposed by Christoph as it provides the "way out" for some heavily constrained IoT environments. |
I'm in favor of option 2. If someone decides to deviate from a SHOULD or STRONGLY RECOMMEND, it is usually for a good reason. I don't see a need to restrict it explicitly to bandwidth and/or memory. |
I am in favor of option 1 or option 2. It can apply to more event consumers and more use cases if we give it a smaller size. |
I’m in for option 2 as well. Somebody might have good reasons to “opt out” even on regular devices. |
My preference is SHOULD and 64k. I don't think a MUST can have an out or it muddles the meaning. |
On today's call we ended up circling around Cathy's proposal of doing option 1 and 2 - which is: lower the limit to 64k and change the MUST to a SHOULD/STRONG RECOMMENDED. @cneijenhuis agreed to update the proposal based on this - so please consider this in prep for next week's call and perhaps we can resolve this one. Of course, if you have concerns, please voice them here in the PR sooner rather than later - or if you can't make the call. |
Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
FYI - this one has been updated per our last call. Please review so we can try to resolve it this week. |
|
||
In order to increase interoperability, all CloudEvent consumers SHOULD accept | ||
events up to a size of 64KB, measured by serializing the CloudEvent as | ||
[JSON](./json-format.md) (minified, i.e. without white-space). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find the whole constraint after 64KB necessary, at all.
"All CloudEvent consumers SHOULD accept events up to an overall wire-frame (HTTP message, AMQP message, MQTT packet, etc.) size of 64Kbyte."
The publisher picks a protocol and encoding whatever comes out of that ought to fit into 64KByte. If the publisher picks MQTT and Protobuf fills that to 64Kbyte, and an intermediary wants to transcode into HTTP and JSON, that might result in a larger payload, but that's the intermediary's risk to deal with. And it might indeed not if you use a transfer encoding with compression. Minification might be strategy to make JSON a little smaller, but I don't think that requires mention here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cneijenhuis any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but that's the intermediary's risk to deal with.
@clemensv That subverts the whole PR. It is actually the exact opposite of the goal that I had with opening this PR.
As a producer I want to create a CloudEvent that I know is supported in ANY protocol and encoding. And as an intermediary, I want to know whether I can forward a CloudEvent in any other transcoding.
The goal of CloudEvents is "to provide interoperability across services, platforms and systems." Therefore, in my opinion, we need to find a guarantee that is respected across services, platforms and systems.
Taking your example, if Foo publishes a CloudEvent with MQTT and Protobuf with <64KB, Bar transcodes it into HTTP and JSON and sends it to EventGrid (and the result is >=64KB), then EventGrid will not accept it. Therefore, Foo and EventGrid are not interoperable. Yet both would be, according to your proposal, valid CloudEvents implementations.
According to my proposal, the size of the CloudEvent in MQTT and Protobuf is irrelevant. There is a standardized way to measure a CloudEvent, and that always returns the same result, independently of service, platform or system. Therefore, if Foo publishes a CloudEvent that - according to this global measurement - SHOULD be accepted, Foo knows that it is interoperable with Bar, and Bar is interoperable with EventGrid.
Again, I'm open to discuss how the measurement is done (focussing solely on the size of the CloudEvent as encoded in JSON wasn't my initial proposal, but it is simple and straight forward, and that was valued in the discussions we had). But I believe the overarching goal of measuring the CloudEvent must be to provide for interoperability, and therefore we must measure the (abstract) event, and not the (concrete) message.
[JSON](./json-format.md) (minified, i.e. without white-space). | ||
|
||
CloudEvent consumers MAY reject events above this size and MAY reject messages | ||
that are not minified (e.g. contain unnecessary white-space). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the MAY is implied by the above and the "minified" clause adds nothing of value if the event is under the set threshold. Why should I reject an event that is within 64KByte, but isn't minified?
after a higher limit (e.g. 256KB) that comfortably fits any valid event size. | ||
However, a middleware that wants to guarantee that the event can be forwarded | ||
in any format SHOULD measure the size of the event independently of the format | ||
it was received in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can deal with transcoding differences here, at all. The receiver gets an event and that SHOULD be within 64KByte wire-frame size (headers and payload and transport protocol overhead). When the event gets transcoded, it's up to the router/re-publisher to deal with how to keep the transcoded event under the threshold. It's assured that the unmodified event would fit, so that's a choice. I would drop that whole paragraph.
Please see: https://lists.cncf.io/g/cncf-cloudevents/topic/vote_how_to_deal_with_sizes/30703703?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,30703703 |
For background, please see #257
TLDR: Most serverless platforms today have size limitations. For interoperability, we should address this issue.
This PR presents two options, the first one is more assertive in that the limit shouldn't be violated, the second one is less strict.
I'm mainly trying to force a discussion with this PR, and am happy to apply changes. In particular, I picked rather low size limits - we should definitely have a discussion on that.
If we settle on some sizes, I'd like to create some events that are exactly at the limit, so that implementations can test whether they are compliant with the spec.