Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'sequence' attribute #291

Merged
merged 4 commits into from
Sep 27, 2018
Merged

Add 'sequence' attribute #291

merged 4 commits into from
Sep 27, 2018

Conversation

duglin
Copy link
Collaborator

@duglin duglin commented Aug 16, 2018

Fixes: #191

Actually, it picks up the work from #191 since that one appears to be stale.

I didn't address any of the comments in #191 because I wasn't sure how
the WG in general felt about them. So please speak up in this PR if you'd like
to see a change.

I'm not advocating for, or against, this proposal - I'm just moving it to a possible extension attribute per our F2F meeting so the WG can decide if we want to include it.

Signed-off-by: Doug Davis dug@us.ibm.com

@BenBeattieHood
Copy link

BenBeattieHood commented Aug 21, 2018

Many thanks for picking this up @duglin - I've been away for a bit and unable to respond. I agree that this is better suited as an extension than a core feature if IoC equipment is unable to provide state for incrementing an ordinal field. I do believe this extension is key to designing for events, as otherwise you get forward-coupling between services; but am happy to let time work this out.

@@ -0,0 +1,15 @@
# Ordinal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest name: Sequence

"ordinal" implies a number; however, this is a string "expressing the relative order of the event" -- this solves a broader range of use cases than just a number, and is definitely a good approach. Propose we adjust the naming to match the definition.

@BenBeattieHood
Copy link

BenBeattieHood commented Aug 30, 2018 via email

@Tapppi
Copy link
Contributor

Tapppi commented Aug 30, 2018

I'm looking for a name that indicates uniqueness and contiguity. I feel in software we're used to ordinals being unique and continuous, -- Sequence seems better but does it imply uniqueness and contiguity?

They both indicate "a particular way to order a number of things", except:

  • ordinal numbers are adopted from set theory, meaning there should be no repetition
  • sequences are a more general concept, allowing repetition within the sequence.

I prefer sequence, as it is the more general approach to the problem not limited by confusion with terms from set theory. I also think it sounds nicer and more understandable as a relative order for things.

Fixes: cloudevents#191

I didn't address any of the comments in cloudevents#191 because I wasn't sure how
the WG in general felt about them. So please speak up in this PR if you'd like
to see a change.

Signed-off-by: Doug Davis <dug@us.ibm.com>
@duglin
Copy link
Collaborator Author

duglin commented Sep 1, 2018

Do we need to define the values of this sting or is it expected that all consumers will know what values each producer will use? As currently written, w/o some out-of-band communication with the producer, I wouldn't be able to use this extension. I'd like to add some text to the PR telling people about this if that's the case - but I need some guidance on what people's expectations are on this point.

@BenBeattieHood
Copy link

@Tapppi I'm ok with repetition, so long as the values are contiguous and therefore assertable.
@duglin Yes, if we were constrained to base-10 integers, it would be more obvious; but if the producer used a wider base (eg base 128) this would need to be communicated in order for the consumer to predict the next value.

@BenBeattieHood
Copy link

BenBeattieHood commented Sep 2, 2018

Fwiw, I still feel it's worth discussing that this sequence number, when combined with the source value, makes the eventID unnecessary. eg. one could synthesize an 'eventid' from [source]+[sequence], eg IoT/LightSensor5/Event4, which would be unique.

@duglin
Copy link
Collaborator Author

duglin commented Sep 3, 2018

I switched this to use "sequence" and added some text about how to know what the possible values are. What do people think?

@duglin duglin changed the title Add 'ordinal' attribute Add 'sequence' attribute Sep 3, 2018
Signed-off-by: Doug Davis <dug@us.ibm.com>
@duglin
Copy link
Collaborator Author

duglin commented Sep 5, 2018

In your example, wouldn't the consumer also stop due to it waiting for 'b' to arrive - which it never will according to your example?

Added: RECOMMENDED as contiguous

@cneijenhuis
Copy link
Contributor

cneijenhuis commented Sep 5, 2018

In ecommerce, processing events in order and without gaps is often important. Our API enables the event consumers to do so by having a field called sequenceNumber.

The feedback I'm often getting is that it is tiresome to re-implement the in-order processing. This is especially true for an (otherwise) stateless consumer: State has to be introduced only for saving the source/sequence pairs. You're forced to go from FaaS to FaaS + Storage.

Personally, I'd prefer a more opinionated approach that does not require out-of-band communication. Or support for both standardized sequences and non-standardized ones that require out-of-band communication.

This would allow some middleware (either the FaaS itself, or an event gateway) to order the events, and the consumer to process them in order without having to re-implement it and keep state.

Example for an opinionated approach

Call the field intSequence or int32Sequence (to allow the definition of other sequences).
MUST start at 1, MUST be contiguous, MUST wrap around after 2,147,483,647 etc.

Example for a less opinionated approach

In addition to the field sequence, add an optional field sequenceType.
If the sequenceType field is set to Integer (or Int32), the sequence field MUST start at 1, MUST be contiguous, MUST wrap around after 2,147,483,647 etc.

The details need to be worked out, but both of these examples make out-of-band communication unnecessary.

@duglin
Copy link
Collaborator Author

duglin commented Sep 5, 2018

@cneijenhuis I like the idea of two fields (sequence and sequenceType). We can then allow for some well-defined sequenceType values and allow people to define new ones - so its an extensibility point. What do other people think?

@BenBeattieHood
Copy link

I agree with including the sequenceType (in addition to sequence) - good suggestion.

(@duglin re 'b': sorry about my example (appears to have disappeared with the updated commit) - you're right I should've used tokens more obviously contiguous - I'm grateful you picked up on the main point I was trying to say.)

@duglin
Copy link
Collaborator Author

duglin commented Sep 5, 2018

@cneijenhuis @BenBeattieHood do one of you guys want to suggest some text for what these two attributes would look like?

@cneijenhuis
Copy link
Contributor

cneijenhuis commented Sep 6, 2018

Here is my first try. Sorry, not wrapped at 80 characters yet 😉

I can also make a PR towards this branch (including wrapping), if that is easier.


Sequence

This extension defines an attribute that can be included within a CloudEvent to represent the position of the event relative to the other events from the same event source.

The exact value and meaning of this attribute is defined by the sequence type.
If the sequence type is missing, or not defined in this specification, event consumers will need to
have some out-of-band communication with the event producer to understand how to interpret the value of the attribute.

Attributes

sequence

  • Type: String
  • Description: Value expressing the relative order of the event. This enables
    interpretation of data supercedence.
  • Constraints
    • REQUIRED
    • MUST be a non-empty lexicographically-orderable string
    • RECOMMENDED as contiguous

sequenceType

  • Type: String
  • Description: Defines the semantics of the sequence attribute. The value "Integer" is defined in this specification. Other values are allowed.
  • Constraints
    • OPTIONAL

Integer

If the sequenceType is set to "Integer", the sequence attribute has the following semantics:

  • The values are string-encoded, signed 32-bit Integers.
  • The first CloudEvent in an Integer sequence has the value 1, the second 2 etc.
  • The sequence is contiguous. In other words, if the event consumer receives two CloudEvents with sequence values of 1 and 3, another CloudEvent with a sequence value of 2 exists, but has not been received yet.
  • The sequence wraps around from 2,147,483,647 (231 - 1) to −2,147,483,648 (-231). The sequence contains repetitions if there are more than 232 CloudEvents.

@duglin
Copy link
Collaborator Author

duglin commented Sep 10, 2018

@cneijenhuis thanks! I just added your suggested text. All - please see what you think

@duglin
Copy link
Collaborator Author

duglin commented Sep 13, 2018

oh darn, I just realized that we don't say what the default sequenceType value is if its not present. I'm assuming Integer. Will add that

@cneijenhuis
Copy link
Contributor

@duglin I left it out intentionally - if it is missing, the consumer should treat it as an unknown value.

See https://github.com/cloudevents/spec/pull/291/files#diff-7fb2227cb146a7ff3164326d6b46f5a1R10

But I'm also totally fine with making Integer the default, and forcing producers to declare some other value. We should change the upper paragraph then, though.

Signed-off-by: Doug Davis <dug@us.ibm.com>
@duglin
Copy link
Collaborator Author

duglin commented Sep 13, 2018

@cneijenhuis oh sorry, I missed that line. I reverted my change. While I think defaulting to Integer might be nice, just to provide some base level of interop, as long as we say what's expected when its missing (even if its "use out of band communication") that addresses my biggest concern. I'll leave it to you experts to decide if "out of band" is ok or if we should default to Integer.

* Constraints
* REQUIRED
* MUST be a non-empty lexicographically-orderable string
* RECOMMENDED as contiguous
Copy link
Member

@erikerikson erikerikson Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly:
MUST be produced in a monotonically increasing manner

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least monotonic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added "monotonically". Wasn't sure about the "increasing" part since that might imply a numeric based sequence and I'm assuming there might be some that don't fit that model. Is that a correct assumption?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering because you asked. You've already added "increasing".

I feel a little uncertain and would defer to a mathematician but the lexicographically orderable requirement gives an absolute ordering across all unique values. I would expect "increasing" to make sense in that context and mean "each subsequent value would be ordered/sorted after all preceding values".

For the record, my initial question would disallow the reasonable circular counter described in the Integer sequence type and therefore could not be a MUST.

Copy link
Member

@erikerikson erikerikson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important extension, thank you for adding it.

# Sequence

This extension defines two attributes that can be included within a CloudEvent
to describe the position of the event relative to the other events from the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps?
to describe the position of an event in the ordered sequence of events produced by a unique event source

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


### sequenceType
* Type: `String`
* Description: Defines the semantics of the sequence attribute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Specifies over defines since the actual definition is external or listed in the section below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

the following semantics:
* The values of `sequence` are string-encoded signed 32-bit Integers
* The first CloudEvent in an `Integer` sequence MUST have the value
`1`, the second `2`, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the , the second `2`, etc. seems duplicative of the following line while specifying the first has value 1 seems properly independent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to tweak it, let me know if I went too far

### sequence
* Type: `String`
* Description: Value expressing the relative order of the event. This enables
interpretation of data supercedence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idempotency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would be better done by checking the EventID instead? If not, have any suggested text?
Do we need to worry about a single event being part of more than one sequence?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would be better done by checking the EventID instead?

That's a great question. As per the eventID definition there is no monotinicity requirement. Without a value constraint, one would have to keep track of every eventID that has been processed (or at the very least a significant scope of them) to ensure no duplicate processing as opposed to checking the relationship of the currently being processed eventID to the last successful checkpoint eventID. I'm not sure that the assumptions I like to make would be appropriate to push onto all users fo the spec so I haven't advocated for that there but would have them adopted in my more-perfect world.

Do we need to worry about a single event being part of more than one sequence?

I'm taking an event as the tangible message sent by a unique producer (as opposed to the occurrence in reality being read by some sensor). As I've read this change, it seemed that there was a one-to-one mapping between unique producers of events with this attribute and the series of numbers representing the singular sequence that they produce. That's a long way to say I don't expect a unique producer to produce multiple sequences OR for multiple unique producers to produce the "same" event (though distinct events could be causally interlinked and contain the exact same content [sans producer id] by the shared reality producing them - e.g. redundant emitters of metrics coming from a single sensor). I could definitely be reading incorrectly.

* Description: Value expressing the relative order of the event. This enables
interpretation of data supercedence.
* Constraints
* REQUIRED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can extensions be required?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be something we need to add more text on... but the point here is that while the extension itself is optional, IF it is being used then this property is REQUIRED. See the documented-extension.md doc, where it says "Support for any extension is OPTIONAL. When an extension definition uses RFC 2199 keywords (e.g. MUST, SHOULD, MAY), this usage only applies to events that use the extension."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying. This makes perfect sense and clears my confusion.

@@ -20,11 +20,11 @@ event producer to understand how to interpret the value of the attribute.
* Constraints
* REQUIRED
* MUST be a non-empty lexicographically-orderable string
* RECOMMENDED as contiguous
* RECOMMENDED as montomically contiguous
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo, maybe monotonically increasing and contiguous reads better?

My math is quite rusty, but I'm not sure if a char-based sequence can legitimately be called monotonically increasing (e.g. a, b, ...z, aa, ab, ...). But, as it is only a recommendation, I think it's fine to get the point across.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

the following semantics:
* The values of `sequence` are string-encoded signed 32-bit Integers.
* The sequence MUST be contiguous, monotonically increasing by `1` and
start with a value of `1`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to nit. It seems like it might read more naturally as

The sequence MUST start with a value of `1` and increase by `1` for each 
subsequent value (i.e. be contiguous and monotonically increasing).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me

Signed-off-by: Doug Davis <dug@us.ibm.com>
@duglin
Copy link
Collaborator Author

duglin commented Sep 20, 2018

Based on the discussions in #308, can those who are interested in possibly using this extension please speak up via a comment in this PR? @BenBeattieHood I assumed you would be - @Tapppi ? @cneijenhuis ?

@cneijenhuis
Copy link
Contributor

cneijenhuis commented Sep 23, 2018

Yes, commercetools is strongly interested in this. We already provide an Integer-based sequence for our proprietary events (called sequenceNumber). For CloudEvents, we'd like to cover this use-case in a non-proprietary way.

In fact, my hope is that an attribute for ordering events makes it into the main spec until 1.0. I think the extension proposed in this PR is a good first step towards that.

@BenBeattieHood
Copy link

You're not wrong @duglin :) I can't see using events without transmitting order - it brings too much logical coupling between services otherwise.

@duglin
Copy link
Collaborator Author

duglin commented Sep 26, 2018

Is this one ready for review? @BenBeattieHood @cneijenhuis you guys ok with the current state?

@BenBeattieHood
Copy link

Yes, thanks :)

@cneijenhuis
Copy link
Contributor

Yes 👍 Thanks for pushing this PR forward!

@duglin
Copy link
Collaborator Author

duglin commented Sep 27, 2018

Approved on 9/27 call

Voters who voiced support (per our new "extension bar"): Christoph(commercetools), Jesse(Oracle)

@duglin duglin merged commit 115493f into cloudevents:master Sep 27, 2018
@duglin duglin deleted the pr191 branch September 27, 2018 17:33
@Tapppi
Copy link
Contributor

Tapppi commented Sep 28, 2018

Thanks for pushing this forward, I'm a bit swamped with product launches at the moment and can barely make it to the calls :) Definitely worthy extension though.

clemensv pushed a commit to clemensv/spec that referenced this pull request Oct 31, 2018
* Add 'ordinal' attribute

Fixes: cloudevents#191

I didn't address any of the comments in cloudevents#191 because I wasn't sure how
the WG in general felt about them. So please speak up in this PR if you'd like
to see a change.

Signed-off-by: Doug Davis <dug@us.ibm.com>

* use sequence and add more clarifying text

Signed-off-by: Doug Davis <dug@us.ibm.com>

* grab Christoph's suggested text

Signed-off-by: Doug Davis <dug@us.ibm.com>

* more tweaks

Signed-off-by: Doug Davis <dug@us.ibm.com>
Signed-off-by: Clemens Vasters <clemensv@microsoft.com>
@duglin duglin mentioned this pull request Nov 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants