Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to use message_info instead unserialized message #170

Merged
merged 14 commits into from
Oct 11, 2023

Conversation

CursedRock17
Copy link
Contributor

@CursedRock17 CursedRock17 commented Sep 5, 2023

This is a response to issue #168, which aims to transition libstatistics_collector to use rmw_message_info_t as the basic type system compared to custom type T. There was obviously more code added, but this should be gone after the next major release, as the guidelines say we can remove deprecated code after a full release cycle. All of the generic programming and templates can be removed methods like HasHeader won't be useful anymore so this will in the long run make the code compile much faster, since we know the defined type.

There were a few places where I made some questionable decisions, first I set our ReceivedMessageAgeCollector and similar classes to have a default type, just of type int. If the user sets the type to rmw_message_info_t there's a chance they may trigger the original, now deprecated method. I thought it was a bit over the top to create a duplicate class that wasn't typed, especially since this shouldn't be permanent, but I just wanted to bring up concerns.

Also, the new OnMessageReceived Methods remove the now_nanoseconds argument, instead taking advantage of rmw_message_info_t's received_timestamp. Going down the chain it ends up being the same type as all forms of nanosecond, so there are less conversion issues. Though, my main concern is how inaccurate it is in comparison to the topic statistics, so it's handled in a DDS or RTPS like here, occuring exactly when the message is received. Didn't think this was going to create an accuracy, or stability issues, but I'm just making sure.

Signed-off-by: CursedRock17 <mtglucas1@gmail.com>

Less user interaction

Signed-off-by: CursedRock17 <mtglucas1@gmail.com>
@CursedRock17 CursedRock17 requested a review from a team as a code owner September 5, 2023 01:25
@CursedRock17 CursedRock17 requested review from emersonknapp and MichaelOrlov and removed request for a team September 5, 2023 01:25
Signed-off-by: CursedRock17 <mtglucas1@gmail.com>
Signed-off-by: CursedRock17 <mtglucas1@gmail.com>
Signed-off-by: CursedRock17 <mtglucas1@gmail.com>
Copy link
Member

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CursedRock17 Thanks for your contribution.
I have a concern about questionable decisions made during the implementation.
I actually expected it to be quite the opposite.

  1. I think it would be easier to operate and further remove deprecated code if we would create duplicate classes for the TopicStatisticsCollector, ReceivedMessagePeriodCollector and ReceivedMessageAgeCollector without templates.
  2. As regards the decision about the now_nanoseconds parameter.
    the new OnMessageReceived Methods remove the now_nanoseconds argument, instead taking advantage of rmw_message_info_t's [received_timestamp](https://github.com/ros2/rmw/blob/rolling/rmw/include/rmw/types.h#L654).
    At first glance, it looks like a reasonable simplification since we have received a timestamp in the message_info data structure.
    But on another hand, If I am not mistaken, there is a cavitate and substantial difference in those timestamps. The recieved_timestamp in message_info settles up on the DDS level when a message getting from the network buffers to the internal DDS queue. While old style now_nanoseconds we are populating right after we are getting the message from the subscription on the application layer.
    For statistics measurement, this is a big deal, it could be a situation where the application layer is in starvation sometimes and tends to have a bigger variance in message periods and ages while DDS layer is not in starvation and receiving messages in it internal queues with a pretty stable rate.
    Using libstatistics_collector we want to identify the cases when any layers on the receiver side contribute to the messages' period and age.
    If we are taking received_timestamp from message_info we almost eliminating the application layer which is responsible for timely taking messages from subscriptions.
  3. Since we are going to remove deprecated calls from our rclcpp layer it makes sense to rewrite tests in the libstatistics_collector package to avoid usage of the deprecated methods and classes. Currently, the build generates a bunch of warnings due to this and we will need to address this anyway as follow-up PR when we will remove deprecated calls on rclcpp layer.

@CursedRock17
Copy link
Contributor Author

Are we able to create duplicate classes with the same name and without templates? I assume we want to keep the same names and don't want to change them when we fully deprecate, but then we would have redefinition of classes. I assume we could do some std::enable_if_t things to activate each of the certain classes but that would require some form of template specialization if I'm correct.

@MichaelOrlov
Copy link
Member

MichaelOrlov commented Sep 12, 2023

@CursedRock17 That's what I was thinking about when I was writing my feedback. This is a good point. Need to doublecheck in regards to the duplicate names for classes without templates.
It might be possible to make a workaround with using like

using ReceivedMessagePeriodCollector =
  libstatistics_collector::topic_statistics_collector::ReceivedMessagePeriodCollector<>;

@CursedRock17
Copy link
Contributor Author

So yes after doing a bit of research we could do something similar to your example, but afaik we can't do something without a template which means we can't make a seamless transition. Here's a quick example that we could use where int would just be rmw_message_info_t

   template<class T = int, class Enable = void>
     class Collector {}; // primary template
    
     template<class T>
    class Collector<T, typename std::enable_if_t<!std::is_same<T, int>::value>>
   {
    public:
      Collector(){}
      ~Collector(){}
     void Message()
      {
        std::cout << "One Type" << std::endl;
      }
    };
   
    template<>
    class Collector<int, typename std::enable_if_t<std::is_same<int, int>::value>>
    {
    public:
      Collector(){}
      ~Collector(){}
      void Message()
      {
        std::cout << "No Type" << std::endl;
      }
    };
   
    int main(){
      using Collector = Collector<>;
      Collector a;
      a.Message();
      return 0;
   }

So here the user could just create an empty template <> and a using alias, just like we will for message_age and message_period, then there won't be any conflict with deprecation. Unfortunately like I mentioned, when the full removal goes through we won't want any templates whatsoever, thus the user's code will still break because having an empty template will be illegal. Though, this makes moving from a custom type T to rmw_message_info_t easier.

The only way we could ensure a transition during the ultimate removal of the custom type T would be to use different naming conventions, but this seems awkward as everything in this library is based on the names we have.
For example moving from ReceivedMessageAgeCollector to ReceivedMessageInfoAgeCollector, so this probably even less preferable. Finally, I should just bring it up since we're in rolling, we could skip the deprecation step entirely and just move everything now, but then we force all users who use the rolling branch right now, to change immediately.

@clalancette
Copy link
Contributor

Finally, I should just bring it up since we're in rolling, we could skip the deprecation step entirely and just move everything now, but then we force all users who use the rolling branch right now, to change immediately.

No, this isn't really an option. We don't do hard breaks of the API, even in Rolling, unless there is no other choice.

@CursedRock17
Copy link
Contributor Author

We don't do hard breaks of the API, even in Rolling

That's what I read in the contributing guide and figured. Should I move on then, with the temporary extra templating? I figured that would be the change that makes the most sense for the move, even if it induces a bit slower compile times for a release cycle of rolling.

@MichaelOrlov
Copy link
Member

@CursedRock17 Yes, the proposed approach looks good to me. Feel free to go ahead with the implementation.
As regards

Unfortunately like I mentioned, when the full removal goes through we won't want any templates whatsoever, thus the user's code will still break because having an empty template will be illegal. Though, this makes moving from a custom type T to rmw_message_info_t easier.

I am curious if we could deprecate using templated versions of classes explicitly and allow only those which are provided with aliases.

@CursedRock17
Copy link
Contributor Author

I am curious if we could deprecate using templated versions of classes explicitly and allow only those which are provided with aliases.

I'm working on that right now, as per cppreference you can do deprecation on template specialization:

template<> struct [[deprecated]] X<int> {}

but it seems like it won't work when using partial template specialization and typename T. So using the example I showed earlier, you can deprecate the first instance of class Collector, you can't deprecate second instance of the class, but you can deprecate the third instance of class Collector with template<>. I guess I can deprecate the OnMessageReceived method of our original templated class, but I don't think I can deprecate the whole class.

@MichaelOrlov
Copy link
Member

@CursedRock17 BTW As regards to the API. I think we need to turn back to the original API with now_nanoseconds parameter.
See point (2) in #170 (review)
Just to make sure that we are aligned and on the same page.

@CursedRock17
Copy link
Contributor Author

Yup we're aligned, I've got those nano_second changes in there already, was just commenting to find out our direction with the style of deprecation we were going with. Just to make sure with point (3) in #170 (review), you want me to remove all of the tests for code that is in the current rolling release, and update them with valid tests for the new untyped (Collector<>) style correct?

@MichaelOrlov
Copy link
Member

Yes. But I would say change them to use aliases for non-templated "Collector" types instead of removing them.

Signed-off-by: CursedRock17 <mtglucas1@gmail.com>
Signed-off-by: CursedRock17 <mtglucas1@gmail.com>
@CursedRock17
Copy link
Contributor Author

Was the way I changed the depreactions, the style we were looking for deprecation-wise?

@MichaelOrlov MichaelOrlov changed the title Message Info Added Add API to use message_info instead unserialized message Sep 21, 2023
Copy link

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CursedRock17 so far i have several minor comments.

CursedRock17 and others added 4 commits September 23, 2023 17:24
Signed-off-by: CursedRock17 <mtglucas1@gmail.com>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Copy link
Member

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CursedRock17 Sorry for my late review and thanks for trying to address partial template deprecation and using aliases for the default templates.

I have tried to prototype and go a bit further with deprecations. I was able to make deprecations on the class level for partially templated class specifications.
You can find my version at CursedRock17#1
Feel free to cherry-pick or merge my MR on your branch.

However, to be able to nicely deprecate partially templated specifications I have to add aliases outside of the topic_statistics_collector namespace e.g. libstatistics_collector::ReceivedMessagePeriodCollector

The rationale for that is that it is pretty obvious from the names that libstatistics_collector::ReceivedMessagePeriodCollector relates to the topic statistics. And it is very unlikely that we would ever need to have another set of classes with the same names which wouldn't belong to the topic_statistics. The topic_statistics namespace will play around like an implementation details namespace and we will use everywhere only aliases.
As soon as deprecated versions with templates are phased out we can safely replace aliases to the non-templated class versions.

IMO this is an elegant solution and a better compromise than deprecation on callbacks only level.

@CursedRock17
Copy link
Contributor Author

I see that aliasing helped us even further with adding the deprecations, this will make it much smoother when we remove the deprecations in, I would guess K-turtle. Other than one small thing, I will merge the PR on my branch, we should be good after that.

MichaelOrlov and others added 3 commits October 10, 2023 13:31
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
…l_in_topic_statistics_collector

Deprecate partially templated specifications on class level in topic statistics collector
Copy link
Member

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Now it looks good to me with green CI.

@clalancette Could you please also additionally approve since you requested changes before?

@MichaelOrlov
Copy link
Member

Gist: https://gist.githubusercontent.com/MichaelOrlov/51983f62e4032af6a111c01dd7c91b3a/raw/d051a902b014e2060f9d475ce96023160b6cd0b6/ros2.repos
BUILD args: --packages-above-and-dependencies libstatistics_collector
TEST args: --packages-above libstatistics_collector
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/12769

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@MichaelOrlov
Copy link
Member

@clalancette @emersonknapp Do you have any ideas why build_and_test_ubuntu job is hanging out?

Signed-off-by: Chris Lalancette <clalancette@gmail.com>
@clalancette
Copy link
Contributor

@clalancette @emersonknapp Do you have any ideas why build_and_test_ubuntu job is hanging out?

It's hard to tell, since there are no errors whatsoever. But my best guess is that there is some bug in the GitHub action, since besides the fact that "build_and_test_ubuntu" is hanging, the "build_and_test_asan" job is not even being triggered.

Someone needs to go in and poke at the GitHub actions to find out what is going on here.

@clalancette
Copy link
Contributor

All of that said, I think the CI run is good enough to move forward with this PR if you are happy with it.

@MichaelOrlov
Copy link
Member

@clalancette The problem is that I don't have the option to merge this PR until the build_and_test_ubuntu passes.

@christophebedard
Copy link
Member

christophebedard commented Oct 11, 2023

This usually happens when we make a CI job required in the branch protection rules, but remove the job from the CI config. However, that job is still in the CI config, so I don't know why. I can force-merge, though.

@christophebedard christophebedard merged commit c1c10d8 into ros-tooling:rolling Oct 11, 2023
9 checks passed
@clalancette
Copy link
Contributor

This usually happens when we make a CI job required in the branch protection rules, but remove the job from the CI config. However, that job is still in the CI config, so I don't know why. I can force-merge, though.

Also, I hard-forked this repository into my own one just for testing: https://github.com/clalancette/libstatistics_collector . There, open PRs correctly run all tests, including the build_and_test_ubuntu and build_and_test_asan. So I have no idea what is going on here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change type of received_message
5 participants