-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify scope of eventID's uniqueness #391
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pushing this 👍
spec.md
Outdated
information such as the type of the event source, the organization | ||
publishing the event, the process that produced the event, and some unique | ||
identifiers. The exact syntax and semantics behind the data encoded in the URI | ||
is event producer defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you dropping these two sentences? They give a lot of motivation for why a URI is chosen at all. If you don't want to include information such as...
, then you could also go straight for a UUID and be done with it 😉
I think it is very valuable to describe what to put into the source
, and why a hierarchical data structure was chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the important thing here is that source can be a URI with an internet-unique authority so you don't have to resort to UUIDs for uniqueness. As I read it, the only normative use of the source is to uniquely identify a producer.
The design of source URIs will depend on the application. It might include "information such as type/organization/process..." but it might not. This spec doesn't seem a good place for advice on URI design, which is a topic in its own right.
I'm ok with putting it back if you feel strongly about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the existing text because it's a little less strict in that the value doesn't necessarily have to be a valid address on the web. For example, they could use http://myserver/....
is that is ok for their setup. We purposely didn't want to get too precise here to allow for flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check my latest text. I want to make it clear that you MAY have an internet-unique authority and URI here but you also MAY also have something of application specific scope. Otherwise it's hard to answer the original question "what is the scope in which source+id is unique" in any meaningful way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alanconway can you elaborate on what problem you're trying to solve with this change? In the end, aren't these all just opaque identifiers since we're not asking the receiver to do anything with these URI-references - like de-reference them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to clarify the scope of uniqueness, the "U" in URI. URIs have a standard authority component which provides well-known, internet-wide uniqueness guarantees. I'm trying to clarify that a "source" can be URI-unique if it has a URI authority OR it can be an authority-less reference path which is only unique in some application-defined context. Both are useful uniqueness guarantees for different applications.
We could leave it unstated since it's implied by using a URI-reference, but this PR is requesting "Clarify scope of eventID's uniqueness" so it seems to need saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My interpretation of the request for clarity around uniqueness was more around simple string comparison type of things :-) So, while the semantics behind these values might be interesting in some cases, I think what's more important (from a spec perspective) is that people know if they do the equivalent of ce.source + ce.id
they'll get a unique value within the scope of that producer, and can use it appropriately. Getting into the semantics of the strings, while interesting, doesn't really change what the code a consumer would write - I don't think anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I've restored the original comments on "source" - with the exception of saying "identifies" rather than "describes" and changed the "id" part along the lines you suggested.
16354ed
to
43413ce
Compare
Updated to deal with most of the comments above, left some conversations unresolved where I need more feedback. |
43413ce
to
b0a7411
Compare
I've minimized the changes and updated in line with comments - I think the only open point of discussion is whether type is part of the dedup or not. As my changes read now it is not. |
b0a7411
to
d2004e4
Compare
6171d4f
to
f836f27
Compare
On Tue, Feb 26, 2019 at 12:03 PM Christoph Neijenhuis < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In spec.md
<#391 (comment)>:
> information such as the type of the event source, the organization
publishing the event, the process that produced the event, and some unique
- identifiers. The exact syntax and semantics behind the data encoded in the URI
- is event producer defined.
+ identifiers.
+
+ The exact syntax of `source`, and the scope in which it is unique,
+ depends on the application. Applications range from a single service
I don't think we should talk about "application" here, but only about
event producer.
It makes no sense to say that 'source' is unique in the scope of a producer
- the source *identifies* the producer. The only way an event consumer can
tell if events are from "the same producer" or "different producers" is to
compare source fields. So uniqueness of source is a consideration for
larger application design.
See https://github.com/cloudevents/spec/blob/master/primer.md#design-goals
especially and later can be connected to create new applications.
An (implementation of an) event producer MAY be part of many applications.
The author of the event producer may not know what applications it will be
part of after a few years.
That's exactly why I mention application. We know that scope+id MUST be
unique for events from a single producer. If your application consists of a
single producer and directly connected consumers, then the source name can
be anything you like and you are done.
To build applications that include multiple producers and loosely coupled
event delivery (routing, store/forward etc.) producers must use 'source'
names that will be unique across all the producers that might ever have
events routed to the same consumer. Otherwise there is no way for the
*consumer* to know whether events with the same source+id are duplicates,
or distinct events from different producers that happen to use the same
source name.
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#391 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHa6XrWNq1vcANr7UeVxfyFA1ypTvDGGks5vRWjdgaJpZM4bFxWU>
.
|
@alanconway I think we agree on:
What we don't agree on is who should be responsible. You're saying it should be the Event Producer. I don't agree. Let me present a few practical examples:
All the event producer can do is to promise that it'll be unique within their scope. You can not make the event producer responsible for the whole application, because the event producer likely isn't in charge of it. |
@cneijenhuis I think we agree on everything, but my wording was messy - have a look at the update. Here's what I'm trying to say:
|
Thinking about this a bit more, it seems like it should be the type+source+id triple that is unique. Firestore is a good example: for a document update, it may fire some of:
Note that |
On Sat, Mar 2, 2019 at 2:24 AM Evan Anderson ***@***.***> wrote:
Thinking about this a bit more, it seems like it should be the
eventid+source+id triple that is unique.
What is "eventid"? My understanding is that the event context attribute
named "id", defined here:
https://github.com/alanconway/cloudevents-spec/blob/master/spec.md#L218
Is an identifier for the event - within the scope of a given producer,
identfied by the attribute named "source".
I don't see "eventid" in the spec.
Firestore is a good example: for a document update, it may fire some of:
Event Type Trigger
onCreate Triggered when a document is written to for the first time.
onUpdate Triggered when a document already exists and has any value
changed.
onDelete Triggered when a document with data is deleted.
onWrite Triggered when onCreate, onUpdate or onDelete is triggered.
Note that onWrite and onUpdate may fire with the same source for the same
occurrence. Using the same event ID allows downstream consumers to
correlate onWrite events with creates and updates, for example.
It is the responsibility of a source implementation to ensure unique "id"
values per event. If there are concurrent event streams then there are 2
choices:
1. a single source serializes the concurrent events into a single stream,
and generates unique ids for them all
2. concurrent event streams are represented as separate sources to preserve
concurrency and eliminate the need for synchronization.
What's the benefit of adding a 3rd layer to the scheme?
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#391 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHa6XtjL2D2S13KC02vYW5JJIQ2EZ03aks5vSicRgaJpZM4bFxWU>
.
|
@alanconway Sorry for the late reply. To follow your number-list: On 3.: I agree, this is the point where the spec needs to be more specific, and clearly spell out when a consumer can deduplicate a message. On 1.: Yes. But that is already in the spec. On 2.: I agree we should provide guidance on that. But I think the primer is a much better place for it. We can only advise readers on how to set up their application properly, so that they don't run into problems given 3. |
Also, to go back to the discussion started in #331 : Like @Tapppi I am not convinced that source + id is a good choice to clarify "scope". It also goes against the implicit design of the spec, given by the current examples. The current examples emphasize a unique type, but not necessarily a unique source. Given the examples, I would design a type of I think using DNS for uniqueness is a good idea in general, but I favor the convention of using reverse-DNS, as done in many programming languages for this use case. A package name Furthermore, I think it is much more important for the event Summary: I think the spec, as it is currently written, implicitly favors type+source+id as the "unique" scope. Design-wise, I personally prefer trying to make the |
@alanconway is this one ready for review or did you want to reply to @cneijenhuis's comments first? |
On Thu, Mar 7, 2019 at 12:18 PM Scott Nichols ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In spec.md
<#391 (comment)>:
>
### id
* Type: `String`
-* Description: ID of the event. The semantics of this string are explicitly
- undefined to ease the implementation of producers. Enables deduplication.
+* Description: Identifies of the event, enables de-duplication. The
+ format of this string is determined by the producer. Each producer
+ MUST generate unique `id` values for its own events, `id` values
+ from different producers might clash. Consumers MAY assume that
+ events with identical `id` and `source` values are duplicates, and
I would say that eventtype is also required to understand if the event is
a duplicate.
So: id + eventtype + source combine to be unique.
I don't think type belongs in de-duplication. Type describes a class of
events with similar semantics, it does not define where they come from.
Many independent producers can produce events of the same type.
The only agent that can reliably generate unique IDs is a producer, because
it *produces* the events. A single producer can only ensure it's own IDs
are unique, so to check for duplicates you absolutely must look at:
- the id, because that's the only bit of data that varies on a per-event
basis.
- the source, because that identifies the producer, and different producers
generate ids independently.
There's no reason to bring type into it - a producer that generates unique
IDs is responsible for never using the same ID on non-duplicate events, and
by definition duplicate events will have the same type (and every other
attribute that matters)
That said, if the consensus is to include type we can. I don't see any
benefit to it, but apart from the extra complexity it doesn't cause a
problem.
—
… You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#391 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHa6Xuzm_PkCbIlenzsMiOlYQ1tfs9yPks5vUUnxgaJpZM4bFxWU>
.
|
I think it warrants review to decide if it makes sense as it stands based
on using source+id.
We need to work thru the current discussion about adding type as part of
the identifier. It should be straightforward to update to the text once
we've reached a consensus.
…On Wed, Mar 6, 2019 at 8:49 PM Doug Davis ***@***.***> wrote:
@alanconway <https://github.com/alanconway> is this one ready for review
or did you want to reply to @cneijenhuis <https://github.com/cneijenhuis>'s
comments first?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#391 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHa6XtpRh4LAkFzR7jRNWZKG-HLMk4sFks5vUHAUgaJpZM4bFxWU>
.
|
@evankanderson wrote:
I think there's a typo in there since I tend to agree with @alanconway that including |
Sorry, that was a typo, I meant |
I also agree with @alanconway : It creates unnecessary complication for the event consumer if we add type. Since the spec already defines that event ID is unique within the scope of the event source, then as long as we ensure source is globally unique, then "source+eventID" will make the event globally unique. As to the example of Firestore, it seems the problem can be solved by assigning different event IDs to |
5129089
to
808c16f
Compare
@alanconway can you rebase this? On last week's call we discussed this a bit and the overall consensus of the group is that Overall, what do people think about this? Aside from the merge-conflict I think it's ready for review/consideration - I'll add it to the call this week, but if you have concerns please voice them in the PR. |
6552d0c
to
985490d
Compare
Rebased and tightened the text a bit. |
On Wed, Apr 17, 2019 at 11:26 AM Doug Davis ***@***.***> wrote:
@alanconway <https://github.com/alanconway> can you rebase this?
Done, and tightened up the text a bit.
On last week's call we discussed this a bit and the overall consensus of
the group is that id + source should be sufficient for uniqueness. There
didn't appear to be a desire to add type into the mix. However, there was
a brief discussion around how using non-unique source values could lead
to duplicates - so a value like producer would not be wise, but a DNS
name or UUID would be. I believe the text in this PR here (
https://github.com/cloudevents/spec/pull/391/files#diff-958e7270f96f5407d7d980f500805b1bR189)
tries to address this.
Yep the new text is:
An application MUST assign a distinct `source` to each distinct producer.
The application MAY use UUIDs, URNs, DNS authorities or an
application-specific
scheme to create unique identifiers.
There are examples of each approach.
Overall, what do people think about this? Aside from the merge-conflict I
think it's ready for review/consideration - I'll add it to the call this
week, but if you have concerns please voice them in the PR.
On a related note, I added a plea to
#326 - make eventId optional. It
was closed but I feel unjustly :)
|
See discussion at issue cloudevents#326. This PR is base on pR cloudevents#391 as it depends on those changes.
Note: the CI build failure above is cause by a broken link to the cloudevent logo, nothing to do with my changes AFAIK. |
See discussion at issue cloudevents#326. This PR is base on pR cloudevents#391 as it depends on those changes.
@alanconway on last week's call we agreed with this general direction. Could you address the merge conflict and any outstanding comments? The test associated in here might change based on @deissnerk's PR ( #420 ) but we may just have to see which goes in first and then the other will have to adjust accordingly. Also, we may need to adjust the Primer too. |
79ba923
to
7856f13
Compare
Reword "id" and "source" to clarify uniqueness requirements. Examples to show different approaches to generating unique source/IDs Clarify producer/consumer responsibilities. Signed-off-by: Alan Conway <aconway@redhat.com>
URNs, DNS authorities or an application-specific scheme to create | ||
unique `source` identifiers. | ||
|
||
A source MAY include more than one producer. In that case the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the word "include" the right word? Perhaps "use" or "leverage"? Not a big deal since I think I get the point, but "include" might sound like we're getting include impl details and suggesting that a producer is a sub-component of a source - which it may not be.
Just one minor question, otherwise LGTM |
On Wed, May 15, 2019 at 3:21 PM Doug Davis ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In spec.md
<#391 (comment)>:
> +- Description: Identifies the context in which an event
+ happened. Often this will include information such as the type of
+ the event source, the organization publishing the event or the
+ process that produced the event. The exact syntax and semantics
+ behind the data encoded in the URI is defined by the event producer.
+
+ Producers MUST ensure that `source` + `id` is unique for each
+ distinct event.
+
+ An application MAY assign a unique `source` to each distinct
+ producer, which makes it easy to produce unique IDs since no other
+ producer will have the same source. The application MAY use UUIDs,
+ URNs, DNS authorities or an application-specific scheme to create
+ unique `source` identifiers.
+
+ A source MAY include more than one producer. In that case the
is the word "include" the right word? Perhaps "use" or "leverage"? Not a
big deal since I think I get the point, but "include" might sound like
we're getting include impl details and suggesting that a producer is a
sub-component of a source - which it may not be.
I'm fine with another word: "encompass", "comprise", "involve", "contain"?
The source isn't a thing in itself, it is a group of related producers, so
I'm not to sure about "use" or "leverage" - there's no separate source
object that acts on the producers. I don't object however.
|
Let's just leave it as is then unless someone thinks of some better word. |
Approved on the 5/16 call |
This is for issue #331
Signed-off-by: Alan Conway aconway@redhat.com