Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify scope of eventID's uniqueness #331

Closed
duglin opened this issue Oct 18, 2018 · 12 comments
Closed

Clarify scope of eventID's uniqueness #331

duglin opened this issue Oct 18, 2018 · 12 comments

Comments

@duglin
Copy link
Collaborator

duglin commented Oct 18, 2018

On today's call, while discussing #326, it was mentioned that the uniqueness statement in eventID might need some clarity. Right now the spec says that it: MUST be unique within the scope of the producer but what constitutes the "producer" could vary and there was a discussion around how combining the source and eventID properties should produce a globally unique value - but the spec doesn't say this. We should decide what kind of clarity we want to add around this - if any.

@alanconway
Copy link
Contributor

alanconway commented Dec 13, 2018

My take on this was that source + eventID is unique because eventID is unique in the scope of a producer, and the source attribute identifies the producer. With valid DNS-scoped URIs assigned to producers that won't step on each others toes, that is internet-unique. Transient producers could use UUIDs (blech) Applications with lesser requirements can assign source IDs to suit their needs.

I'm just getting into the specs tho so may be off track.

@Tapppi
Copy link
Contributor

Tapppi commented Feb 14, 2019

I do not believe id can be required to be unique in a meaningful way, either in combination with source or any other attribute. The spec does not prescribe what the source or type (if that was included in the "unique combination") should be (other than type SHOULD be prefixed with reverse-DNS and source MUST be a URI-reference). It doesn't say that source should be DNS-scoped or globally unique. I also don't think it makes sense to require uniqueness considering the complexity of events passing in different internal/external systems, networks etc. and the practical need for a "source registry" created. The spec describes source like this (emphasis mine):

This describes the event producer. Often this will include information such as the type of the event source, the organization publishing the event, the process that produced the event, and some unique identifiers. The exact syntax and semantics behind the data encoded in the URI is event producer defined.


As an example, consider an open-source Change Data Capture project producing cloudevents from actions on a database. The relevant fields might look something like this and be compliant with the current spec:

{
  "type": "io.CatDogCat.db.row.delete/v1",
  "source": "db-name/orders/order",
  "id": "123123123",
  ...
  "data": {
    "before": {"id": "987"},
    "after": null,
    ...
  }
}

Even though the project allows distinct source names for many databases with the same schema/table names by prefixing the source with db-name, there is no way for them to guarantee or require uniqueness of the database name across deployments by whoever happens to use that project without integrating with some kind of repository keeping track of source names or whatever. If we require any kind of global uniqueness for any combination of the attributes, the project cannot reasonably be expected to produce compliant CloudEvents. It could add a UUID to the source, but that would hinder users of the project. Why would the maintainer want to do that?

For the user, the db-name component is sufficiently unique in their use-case. If this event was passed to a SaaS accepting cloudevents, they could easily use some kind of customer or project id (maybe in combination with source?) to scope the uniqueness of id and document so in their own documentation.

In my opinion, the conflict introduced by requiring uniqueness in other scopes than the producer-defined one severely diminishes the usability of the spec. The example shows how "unique in the scope of the producer" aptly describes many real-world situations. I am however questioning whether it would be clearer to say "source+eventID must be unique in the scope of the producer", which would of course sound even more relaxed than the current wording but might be more useful and descriptive 😅

@duglin
Copy link
Collaborator Author

duglin commented Feb 18, 2019

@Tapppi thanks for the comment. Right now we say: the ID MUST be unique within the scope of the producer. Does making source+id unique as well add value? Would seem to be redundant or unnecessary.

@alanconway
Copy link
Contributor

I think the wording needs clarification on 2 points (assuming my reading is correct)

  1. The source used to be a URI, but it is now a URI-reference - this seems like a mistake. Only an absolute URI can uniquely identify a source, a relative URI-ref like "db-name/orders/order" is not unique without some absolute URI to resolve it against.

  2. The intent is that source+eventid only be a sufficient unique identifier for de-duplication of events: a consumer can safely discard all but one event with matching source+eventid without examining other attributes. In particular, consumers do not need to examine the "type" attribute for de-duplication. (If I am misreading and the intent is that type+source+eventid be unique then that should be made clear)

@alanconway
Copy link
Contributor

I was going to put in a PR but I need to understand the switch from URI to URI-Reference in
commit 31850e1 first.
Relative URI-references can only be resolved in the context of some "context" URI. I see nothing in the event spec that says what that context URI would be. If the spec is assuming some "implicit" context that is set up before events are received (e.g. as HTTP requests), then we need to update the non-HTTP transport specs (AMQP, MQTT) as they will have to carry that context URI somewhere.

@duglin
Copy link
Collaborator Author

duglin commented Feb 20, 2019

I believe the "context" is "the producer". It's purposely a bit vague to allow for the producer to have the freedom to choose what works best for them and their consumers.

alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Feb 20, 2019
This is for issue cloudevents#331

Signed-off-by: Alan Conway <aconway@redhat.com>
@alanconway
Copy link
Contributor

Attempted to clarify this in PR #391. Goal was not to change the meaning of the spec but just to clarify the uniqueness implications for de-duplication and the meaning of URIs with/without an authority when used as a source.

@cneijenhuis
Copy link
Contributor

cneijenhuis commented Feb 20, 2019

I was going to put in a PR but I need to understand the switch from URI to URI-Reference in
commit 31850e1 first.

The PR contains a description: #338 - the commit does NOT change anything beside the short-hand. It was already a URI-reference.

The URI-reference was introduced here: #169

1. The source used to be a URI, but it is now a URI-reference - this seems like a mistake. Only an absolute URI can uniquely identify a source, a relative URI-ref like "db-name/orders/order" is not unique without some absolute URI to resolve it against.

See #169 (comment)

TL;DR: The event type gives you the base URI for relative URIs.

(But, as pointed out in #331 (comment) , just because the URI is absolute doesn't automatically make it unique. E.g. scheme://CatDogCat.io/db-name/orders/order is not unique, because CatDogCat.io is the URL of the open source project, not of the actual deployment.)

alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Feb 20, 2019
Signed-off-by: Alan Conway <aconway@redhat.com>
@duglin
Copy link
Collaborator Author

duglin commented Feb 20, 2019

TL;DR: The event type gives you the base URI for relative URIs.

is that in the spec? This wasn't my assumption.

@alanconway
Copy link
Contributor

alanconway commented Feb 20, 2019 via email

@cneijenhuis
Copy link
Contributor

is that in the spec? This wasn't my assumption.

@duglin I didn't say that - I was pointing to (more or less) the same question that has been asked before, and the answer that was given. Alan wanted to "understand the switch from URI to URI-Reference", and, since I was in the same position a few months ago, I pointed him to the relevant discussion.

but says nothing about how to reliably parse the type String to transform it into a URI.

Please read Clemens comment in full, my TL;DR is really just a TL;DR 😉

Clemens example of Microsoft.Storage.BlobCreated doesn't even contain a URI that can be parsed in any way. His argument is that, for a consumer that wants to explicitly consume that particular event (the end-consumer, not a middleware), it'll be trivial to know that this is in the Azure cloud. This happens implicitly, out-of-band, and can not necessarily be done by just looking at the event type without further knowledge.

A middleware will not be able to figure out what the correct base URI for Microsoft.Storage... is, but - at least for deduplication - it is enough to know that a base URI could be figured out. In other words, whether I'm using the internet-unique eventType or the internet-unique base URI (derived from the eventType) doesn't make a difference.

If that's the intent it needs to be clarified.

I think so too. The workgroup has agreed to this (before I joined...), and it also took me some time to understand the reasoning behind it.


Note to self: Don't attempt to add a TL;DR, it'll do more harm than good 😉

alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Feb 21, 2019
Reword "id" uniqueness description.

Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Feb 21, 2019
Reword "id" uniqueness description.

Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Feb 26, 2019
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Feb 26, 2019
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Feb 26, 2019
…queness.

Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Mar 1, 2019
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Mar 1, 2019
Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Mar 1, 2019
Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Mar 27, 2019
Reword "id" uniqueness description.

Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Mar 27, 2019
…queness.

Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Mar 27, 2019
Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Apr 17, 2019
Reword "id" uniqueness description.

Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Apr 17, 2019
…queness.

Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Apr 17, 2019
Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Apr 24, 2019
Reword "id" uniqueness description.

Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Apr 24, 2019
…queness.

Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Apr 24, 2019
Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Apr 24, 2019
Co-Authored-By: alanconway <aconway@redhat.com>
Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Apr 24, 2019
Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue May 9, 2019
Reword "id" and "source" to clarify uniqueness requirements.
Examples to show different approaches to generating unique source/IDs
Clarify producer/consumer responsibilities.

Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue May 9, 2019
Reword "id" and "source" to clarify uniqueness requirements.
Examples to show different approaches to generating unique source/IDs
Clarify producer/consumer responsibilities.

Signed-off-by: Alan Conway <aconway@redhat.com>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue May 9, 2019
Reword "id" and "source" to clarify uniqueness requirements.
Examples to show different approaches to generating unique source/IDs
Clarify producer/consumer responsibilities.

Signed-off-by: Alan Conway <aconway@redhat.com>
@duglin
Copy link
Collaborator Author

duglin commented Jun 8, 2019

I'm close to close this because I believe we've address it in the recent PRs we merged about uniqueness. If someone disagrees please speak up and we can reopen.

@duglin duglin closed this as completed Jun 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants