Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sequence numbers to message info structure #318

Merged
merged 23 commits into from
Mar 21, 2022

Conversation

ivanpauno
Copy link
Member

@ivanpauno ivanpauno commented Mar 3, 2022

  • Add publication/reception sequence numbers to rmw_message_info_t.
  • Add rmw_feature_supported() function, to query if the rmw implementation being used supports a feature.

Related PRs:


This information is not provided by all DDS vendors.
Both sequence numbers are an extension, and not specified by the DDS standard AFAIU.

Connext provides both numbers, see publication_sequence_number and reception_sequence_number here.

FastDDS provides the publication_sequence_number, but not the reception_sequence_number.
See here and here (docs aren't super clear, but based on some testing that's the publication sequence number).

CycloneDDS doesn't seem to provide any of both (see here).

Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
@ivanpauno ivanpauno added the enhancement New feature or request label Mar 3, 2022
@ivanpauno ivanpauno requested a review from wjwwood March 3, 2022 16:57
@ivanpauno ivanpauno self-assigned this Mar 3, 2022
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
@christophebedard
Copy link
Member

  • Do we need to add both? Is only having the publication sequence number enough?

What's the intended use-case?

I've used source_timestamp as a kind of unique message ID. Obviously, I could get collisions if two publishers publish at the same exact time on the same exact topic (from different machines, or even the same process/node). Getting source_timestamps does require collecting it from the DDS implementation on the publisher side, since it's not provided to rmw, so publication_sequence_numbers at the rmw level could replace this.

Could we have collisions with (publication) sequence numbers? How are they generated, and what are the guarantees? Since it's not a standard DDS feature, and since rmw is not DDS-only(:tm:), rmw probably needs to have its own definition or requirements for these sequence numbers. Also, do reception sequence numbers provide anything that publication sequences numbers do not provide? Maybe I'm not thinking of the right use-case.

@ivanpauno
Copy link
Member Author

Could we have collisions with (publication) sequence numbers?

Yes, each publisher will use unique sequence numbers, but sample of different publishes may use the same.
You can disambiguate with the publisher gid though (which is available in the sample info struct).
After a long long time, the sequence number may wrap around IIUC.

Getting source_timestamps does require collecting it from the DDS implementation on the publisher side, since it's not provided to rmw

I think that's already part of the message info struct, though IIRC it's not being filled by all the DDS vendors.

do reception sequence numbers provide anything that publication sequences numbers do not provide?

I think they don't provide much extra info, but I'm not sure.

What's the intended use-case?

IIUC, the idea is to use this to collect statistics (which messages were lost, etc).
@wjwwood can probably expand more.

There's also the sample lost status and sample rejected status in DDS, that can be used for this. We already have bindings for the first one (this).
The sequence numbers might be more insightful to make statistics than those statuses, but I think that having only the publication sequence number should be enough.

How are they generated, and what are the guarantees?

This is a good question, because I actually don't know if the "publication_sequence_number" provided by FastDDS has the same guarantees that the one provided by Connext.
Based on some basic testing, they seem to be the same thing.

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm in general.

Do we need a method to check if this feature is supported separate from the max value? I'm just thinking that it might be useful for a tool to know if this is supported before receiving a message (or even creating a publisher/subscription).

rmw/include/rmw/types.h Outdated Show resolved Hide resolved
rmw/include/rmw/types.h Outdated Show resolved Hide resolved
rmw/include/rmw/types.h Outdated Show resolved Hide resolved
rmw/include/rmw/types.h Outdated Show resolved Hide resolved
rmw/include/rmw/types.h Outdated Show resolved Hide resolved
@christophebedard
Copy link
Member

Yes, each publisher will use unique sequence numbers, but sample of different publishes may use the same. You can disambiguate with the publisher gid though (which is available in the sample info struct)

Oh right, that's a great point.

I think that's already part of the message info struct, though IIRC it's not being filled by all the DDS vendors.

Unless I'm mistaken, I don't think the source timestamp is made available to ROS 2 (rmw and up) on the publisher side (there is no "message info" when publishing a message). The source timestamp is generated inside the DDS implementation right before writing/sending a message. However, it is indeed available on the receiver side.

IIUC, the idea is to use this to collect statistics (which messages were lost, etc). @wjwwood can probably expand more.

There's also the sample lost status and sample rejected status in DDS, that can be used for this. We already have bindings for the first one (this). The sequence numbers might be more insightful to make statistics than those statuses, but I think that having only the publication sequence number should be enough.

This makes sense! It's similar to what I'm doing.

This is a good question, because I actually don't know if the "publication_sequence_number" provided by FastDDS has the same guarantees that the one provided by Connext. Based on some basic testing, they seem to be the same thing.

I'm wondering what downstream users of rmw can expect, i.e., if this is part of the interface/abstraction, or if the behaviour is always going to come down to the middleware implementation.

@ivanpauno
Copy link
Member Author

Unless I'm mistaken, I don't think the source timestamp is made available to ROS 2 (rmw and up) on the publisher side

Ah yes, it's only available in the subscription side at the moment (and supported only by some rmw implementations).

@ivanpauno
Copy link
Member Author

I'm wondering what downstream users of rmw can expect, i.e., if this is part of the interface/abstraction, or if the behaviour is always going to come down to the middleware implementation.

IMO, the requisites should be:

  • Publication sequence numbers should monotonically increase for the same publisher.
    i.e. if a take() operation 1 happened before take() operation 2 and the publisher gid is the same, then sn1 < sn2.
    exception: it can wrap around at some point
  • (sn2 - sn1 - 1) = NMessagesLost, where sn1 and sn2 are the publication sequence numbers of two consecutive take() operations that received messages from the same publisher (i.e. also same publisher_gid).

I think it's reasonable to always expect that from any rmw implementation that support sequence numbers, so we can maybe add that to the docs.
@wjwwood what do you think?

Similarly for the reception sequence number, it should be a monotonically increasing number, contiguous take() operations differ always by 1.

@ivanpauno
Copy link
Member Author

Do we need a method to check if this feature is supported separate from the max value? I'm just thinking that it might be useful for a tool to know if this is supported before receiving a message (or even creating a publisher/subscription).

That sounds okay to me.
What's the best API for this?
I think that maybe

bool rmw_message_info_publication_sequence_number_supported(void);
bool rmw_message_info_subscription_sequence_number_supported(void);

can easily become too verbose if we start adding similar functions for other features (e.g. is source_timestamp supported?),

Maybe:

bool rmw_feature_supported(rmw_feature_t feature_v);

typedef enum rmw_feature_e {
  RMW_FEATURE_MESSAGE_INFO_PUBLICATION_SEQUENCE_NUMBER,
  ...
} rmw_feature_t;

is a better idea.

ivanpauno and others added 6 commits March 4, 2022 13:57
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

Co-authored-by: William Woodall <william+github@osrfoundation.org>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

Co-authored-by: William Woodall <william+github@osrfoundation.org>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

Co-authored-by: William Woodall <william+github@osrfoundation.org>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

Co-authored-by: William Woodall <william+github@osrfoundation.org>
…mpilation issue

Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
…quence number

Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
@ivanpauno
Copy link
Member Author

I'm wondering what downstream users of rmw can expect, i.e., if this is part of the interface/abstraction, or if the behaviour is always going to come down to the middleware implementation.

I have listed some requirements of what clients can expect in 7810f90.
Maybe I have specified more than needed or we want to add extra requirements.

Please feel free to suggest improvements to my wording as well!

@ivanpauno ivanpauno requested a review from wjwwood March 4, 2022 17:45
Copy link
Member

@christophebedard christophebedard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requirements make sense to me.

rmw/include/rmw/types.h Outdated Show resolved Hide resolved
rmw/include/rmw/types.h Outdated Show resolved Hide resolved
rmw/include/rmw/types.h Outdated Show resolved Hide resolved
rmw/include/rmw/types.h Outdated Show resolved Hide resolved
ivanpauno and others added 4 commits March 4, 2022 16:41
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

Co-authored-by: Christophe Bedard <bedard.christophe@gmail.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

Co-authored-by: Christophe Bedard <bedard.christophe@gmail.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

Co-authored-by: Christophe Bedard <bedard.christophe@gmail.com>
Co-authored-by: Christophe Bedard <bedard.christophe@gmail.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
@ivanpauno ivanpauno force-pushed the ivanpauno/add-sequence-numbers-to-message-info branch from c766c00 to 595bc6b Compare March 4, 2022 20:01
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
@wjwwood
Copy link
Member

wjwwood commented Mar 8, 2022

The rmw_feature_supported() route sounds good to me.

@ivanpauno ivanpauno requested a review from wjwwood March 10, 2022 19:21
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
@ivanpauno
Copy link
Member Author

This PR and ros2/rmw_fastrtps#587 are ready, I still need to update ros2/rmw_connextdds#74 and create a PR in rmw_cyclonedds.

rmw/include/rmw/types.h Outdated Show resolved Hide resolved
rmw/src/types.c Show resolved Hide resolved
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that one open discussion, it lgtm.

rmw/include/rmw/types.h Outdated Show resolved Hide resolved
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
@ivanpauno ivanpauno requested a review from wjwwood March 16, 2022 20:34
rmw/include/rmw/types.h Outdated Show resolved Hide resolved
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
@ivanpauno ivanpauno requested a review from wjwwood March 17, 2022 14:32
@ivanpauno
Copy link
Member Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
@ivanpauno
Copy link
Member Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a doc update, lgtm with CI.

I offered a revision of the docs again, and it brings to my attention that we need to nail down what we mean by gid and "global(ly)", but that shouldn't block this I think.

We could use:

  • GID (Global Identifier): what we use now, where global means per process, but could mean in the ROS graph
  • GUID (Globally Unique Identifier): means the same as GID except each is unique and should never be reused withing the scope of "global"
  • UUID (universally unique identifier): unique in the literal universe, within statistical bounds, which is what the general purpose uuid's are used for CS and usually require as "good" as is possible random numbers

Anyway, we need to figure out what we need/want and be clearer about it, again not blocking this pr.

rmw/include/rmw/types.h Outdated Show resolved Hide resolved
rmw/include/rmw/types.h Outdated Show resolved Hide resolved
@wjwwood
Copy link
Member

wjwwood commented Mar 17, 2022

@ivanpauno you might want to update the content of the issue description (now that we've iterated a bit) and link to all the related prs, rather than relying on cross-references from github.

@ivanpauno
Copy link
Member Author

GID (Global Identifier): what we use now, where global means per process, but could mean in the ROS graph
GUID (Globally Unique Identifier): means the same as GID except each is unique and should never be reused withing the scope of "global"
UUID (universally unique identifier): unique in the literal universe, within statistical bounds, which is what the general purpose uuid's are used for CS and usually require as "good" as is possible random numbers

I think that the idea of the gid was always for it to be global and unique.
And with global, I mean all the processes involved would see the same ids.
AFAIU, UUID/GUID are the same thing (though the term UUID is typically used to refer to a particular standard).

ivanpauno and others added 2 commits March 17, 2022 18:04
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

Co-authored-by: William Woodall <william+github@osrfoundation.org>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
@ivanpauno
Copy link
Member Author

  • Windows Build Status

@ivanpauno
Copy link
Member Author

@ivanpauno you might want to update the content of the issue description (now that we've iterated a bit) and link to all the related prs, rather than relying on cross-references from github.

Done

@ivanpauno
Copy link
Member Author

All PRs were approved and CI is passing, going in!

@ivanpauno ivanpauno merged commit 3299fca into master Mar 21, 2022
@delete-merged-branch delete-merged-branch bot deleted the ivanpauno/add-sequence-numbers-to-message-info branch March 21, 2022 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants