Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QoS depth settings for clients/service ignored #1785

Closed
mauropasse opened this issue Sep 24, 2021 · 18 comments · Fixed by ros2/rmw_cyclonedds#340 or ros2/rmw_fastrtps#564
Closed

QoS depth settings for clients/service ignored #1785

mauropasse opened this issue Sep 24, 2021 · 18 comments · Fixed by ros2/rmw_cyclonedds#340 or ros2/rmw_fastrtps#564

Comments

@mauropasse
Copy link
Collaborator

Seems that setting the QoS depth for clients and services is ignored:

  // Client/Service options
  rmw_qos_profile_t qos_profile = rmw_qos_profile_services_default;
  qos_profile.depth = 1;

  // Create client and service with options
  auto client = client_node->create_client<AddTwoInts>("add_two_ints", qos_profile);
  auto server = service_node->create_service<AddTwoInts>("add_two_ints", handle_service, qos_profile);

The following program shows the issue:

  1. Set QoS.depth = 1 for client/service
  2. Client does some requests
  3. Service node spins: processes all requests (not just one)
  4. Client node spins: get all requests responses (not just one)
#include <cinttypes>
#include <memory>

#include "example_interfaces/srv/add_two_ints.hpp"
#include "rclcpp/rclcpp.hpp"

using AddTwoInts = example_interfaces::srv::AddTwoInts;
rclcpp::Node::SharedPtr service_node = nullptr;
rclcpp::Node::SharedPtr client_node = nullptr;

using Executor = rclcpp::executors::SingleThreadedExecutor;

void handle_service(
  const std::shared_ptr<rmw_request_id_t> request_header,
  const std::shared_ptr<AddTwoInts::Request> request,
  const std::shared_ptr<AddTwoInts::Response> response)
{
  (void)request_header;
  response->sum = request->a + request->b;
  RCLCPP_INFO(
    service_node->get_logger(),
    "Handle service. Request: %" PRId64 " + %" PRId64 " = %" PRId64, request->a, request->b, response->sum);
}

void client_callback(
  std::shared_ptr<typename AddTwoInts::Request> request,
  typename rclcpp::Client<AddTwoInts>::SharedFuture result_future)
{
    auto result = result_future.get();
    RCLCPP_INFO(
      client_node->get_logger(),
      "Client callback. Result of %" PRId64 " + %" PRId64 " = %" PRId64, request->a, request->b, result->sum);
}

int main(int argc, char ** argv)
{
  rclcpp::init(argc, argv);

  // Node options
  rclcpp::NodeOptions node_options = rclcpp::NodeOptions();
  node_options.use_intra_process_comms(false);
  node_options.start_parameter_services(false);
  node_options.start_parameter_event_publisher(false);

  // Create nodes with node options
  service_node = rclcpp::Node::make_shared("service", node_options);
  client_node = rclcpp::Node::make_shared("client", node_options);

  // Client/Service options
  rmw_qos_profile_t qos_profile = rmw_qos_profile_services_default;
  qos_profile.history = RMW_QOS_POLICY_HISTORY_KEEP_LAST;
  std::cout << "Setting OoS depth = 1 for client and service" << std::endl;
  qos_profile.depth = 1;

  // Create client and service with options
  auto client = client_node->create_client<AddTwoInts>("add_two_ints", qos_profile);
  auto server = service_node->create_service<AddTwoInts>("add_two_ints", handle_service, qos_profile);

  // Create separate executors for client and service nodes to better show the issue
  auto service_executor = std::make_shared<Executor>();
  service_executor->add_node(service_node);
  auto client_executor = std::make_shared<Executor>();
  client_executor->add_node(client_node);

  // Send requests (depth is set to 1, so we should only have a single client request)
  for (int i=0; i <= 15; i++) {
    auto request = std::make_shared<AddTwoInts::Request>();
    request->a = i;
    request->b = i;

    std::function<void(
    typename rclcpp::Client<AddTwoInts>::SharedFuture future)> callback_function = std::bind(
        &client_callback,
        request,
        std::placeholders::_1
    );

    std::cout << "Client: async_send_request number: " << i << std::endl;
    client->async_send_request(request, callback_function);
  }

  char a;
  std::cout << "Press key to spin the Server" << std::endl;
  std::cin >> a;

  /******** SERVICE EXECUTOR THREAD *********************************************/
  std::thread service_spin_thread([=](){
      std::cout << "service_executor->spin()" << std::endl;
      service_executor->spin();
  });
  service_spin_thread.detach();
  /********************************************************************************************/

  std::cout << "Press key to spin the Client" << std::endl;
  std::cin >> a;

  /******** CLIENT EXECUTOR THREAD ***********************************************/
  std::thread spin_thread([=](){
      std::cout << "client_executor->spin()" << std::endl;
      client_executor->spin();
  });
  spin_thread.detach();
  /********************************************************************************************/

  std::cin >> a;
  std::cout << "rclcpp::shutdown()" << std::endl;

  rclcpp::shutdown();
  service_node = nullptr;
  client_node = nullptr;
  return 0;
}

The output:

Setting OoS depth = 1 for client and service
Client: async_send_request number: 0
Client: async_send_request number: 1
Client: async_send_request number: 2
Client: async_send_request number: 3
Client: async_send_request number: 4
Client: async_send_request number: 5
Client: async_send_request number: 6
Client: async_send_request number: 7
Client: async_send_request number: 8
Client: async_send_request number: 9
Client: async_send_request number: 10
Client: async_send_request number: 11
Client: async_send_request number: 12
Client: async_send_request number: 13
Client: async_send_request number: 14
Client: async_send_request number: 15
Press key to spin the Server

service_executor->spin()
[INFO] [1632502510.911567814] [service]: Handle service. Request: 0 + 0 = 0
[INFO] [1632502510.912534665] [service]: Handle service. Request: 1 + 1 = 2
[INFO] [1632502510.912857264] [service]: Handle service. Request: 2 + 2 = 4
[INFO] [1632502510.913159732] [service]: Handle service. Request: 3 + 3 = 6
[INFO] [1632502510.913457916] [service]: Handle service. Request: 4 + 4 = 8
[INFO] [1632502510.913763411] [service]: Handle service. Request: 5 + 5 = 10
[INFO] [1632502510.914055334] [service]: Handle service. Request: 6 + 6 = 12
[INFO] [1632502510.914350376] [service]: Handle service. Request: 7 + 7 = 14
[INFO] [1632502510.914641348] [service]: Handle service. Request: 8 + 8 = 16
[INFO] [1632502510.914951640] [service]: Handle service. Request: 9 + 9 = 18
[INFO] [1632502510.915243627] [service]: Handle service. Request: 10 + 10 = 20
[INFO] [1632502510.915533047] [service]: Handle service. Request: 11 + 11 = 22
[INFO] [1632502510.915821592] [service]: Handle service. Request: 12 + 12 = 24
[INFO] [1632502510.916133874] [service]: Handle service. Request: 13 + 13 = 26
[INFO] [1632502510.916431093] [service]: Handle service. Request: 14 + 14 = 28
[INFO] [1632502510.916724053] [service]: Handle service. Request: 15 + 15 = 30

Press key to spin the Client
client_executor->spin()
[INFO] [1632502511.534574504] [client]: Client callback. Result of 0 + 0 = 0
[INFO] [1632502511.534917210] [client]: Client callback. Result of 1 + 1 = 2
[INFO] [1632502511.535152711] [client]: Client callback. Result of 2 + 2 = 4
[INFO] [1632502511.535379114] [client]: Client callback. Result of 3 + 3 = 6
[INFO] [1632502511.535604549] [client]: Client callback. Result of 4 + 4 = 8
[INFO] [1632502511.535839307] [client]: Client callback. Result of 5 + 5 = 10
[INFO] [1632502511.536060394] [client]: Client callback. Result of 6 + 6 = 12
[INFO] [1632502511.536311128] [client]: Client callback. Result of 7 + 7 = 14
[INFO] [1632502511.536534991] [client]: Client callback. Result of 8 + 8 = 16
[INFO] [1632502511.536754865] [client]: Client callback. Result of 9 + 9 = 18
[INFO] [1632502511.536974771] [client]: Client callback. Result of 10 + 10 = 20
[INFO] [1632502511.537194631] [client]: Client callback. Result of 11 + 11 = 22
[INFO] [1632502511.537414983] [client]: Client callback. Result of 12 + 12 = 24
[INFO] [1632502511.537635309] [client]: Client callback. Result of 13 + 13 = 26
[INFO] [1632502511.537853318] [client]: Client callback. Result of 14 + 14 = 28
[INFO] [1632502511.538072821] [client]: Client callback. Result of 15 + 15 = 30
@Barry-Xu-2018
Copy link
Collaborator

Yes. I can reproduce this problem.
After checking codes, I find
https://github.com/ros2/rmw_cyclonedds/blob/2aa2cf527d079265195b2dca91bb9c82481be0c1/rmw_cyclonedds_cpp/src/rmw_node.cpp#L4312

  dds_qset_reliability(qos, DDS_RELIABILITY_RELIABLE, DDS_SECS(1));
  dds_qset_history(qos, DDS_HISTORY_KEEP_ALL, DDS_LENGTH_UNLIMITED);

So your settings doesn't take effect.
I don't know the reason why code like this.
@eboasson Do you think this is a bug ?

@fujitatomoya
Copy link
Collaborator

I can reproduce this issue with rmw_fastrtps. @MiguelCompany friendly ping.

fujitatomoya added a commit to fujitatomoya/ros2_test_prover that referenced this issue Sep 28, 2021
  ros2/rclcpp#1785

Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
@MiguelCompany
Copy link
Contributor

@fujitatomoya As far as I can see, the QoS settings are correctly passed to the created DDS writers and readers.

But then, whenever the reader has a request ready, the listener takes the request and keeps it internally on the callback here

It is that method the one that is not respecting the depth setting. It should ensure that the size of the list member never exceeds the configured depth.

@fujitatomoya
Copy link
Collaborator

@MiguelCompany thanks for quick response ❗ what would we want to do next? probably create sub-issue on https://github.com/ros2/rmw_fastrtps?

@eboasson
Copy link
Contributor

Yes. I can reproduce this problem. After checking codes, I find https://github.com/ros2/rmw_cyclonedds/blob/2aa2cf527d079265195b2dca91bb9c82481be0c1/rmw_cyclonedds_cpp/src/rmw_node.cpp#L4312

  dds_qset_reliability(qos, DDS_RELIABILITY_RELIABLE, DDS_SECS(1));
  dds_qset_history(qos, DDS_HISTORY_KEEP_ALL, DDS_LENGTH_UNLIMITED);

So your settings doesn't take effect. I don't know the reason why code like this. @eboasson Do you think this is a bug ?

That's been there since the initial commit. A.k.a. when I quickly cobbled together a half-working RMW layer without knowing much about people's expectations regarding the behaviour 🙂

From what I can tell looking at the rmw.h file, the expectation is that the QoS settings are taken used without any restrictions. So yes, it is a bug. Deleting those two lines should fix it.

That doesn't necessarily mean I think it wise to let the application (especially the service) control the queue depth in this manner. In a general a service is expected to receive multiple requests at the same time, letting it silently drop requests makes it become quite unreliable. There have been many issues about service reliability ...

On the client side, it makes some more sense in that an application really does control how many requests it sends out in rapid succession and how it correlates the responses with the requests. Even so, it at first glance, the behaviour you get with a shallow history depth seems to fit better with ROS 2's actions than with its service/client mechanism.

@MiguelCompany
Copy link
Contributor

create sub-issue on https://github.com/ros2/rmw_fastrtps?

Yeah. Please do.

@Barry-Xu-2018
Copy link
Collaborator

@eboasson

That doesn't necessarily mean I think it wise to let the application (especially the service) control the queue depth in this manner. In a general a service is expected to receive multiple requests at the same time, letting it silently drop requests makes it become quite unreliable. There have been many issues about service reliability ...

On the client side, it makes some more sense in that an application really does control how many requests it sends out in rapid succession and how it correlates the responses with the requests. Even so, it at first glance, the behaviour you get with a shallow history depth seems to fit better with ROS 2's actions than with its service/client mechanism.

Completely agree.
Most users will still use the default QoS configuration. And default QoS value can satisfy most cases.
So If the user changes the QoS value himself, he should be responsible for the problems that will occur.
Therefore, we should accept user settings for QoS even if QoS setting is unreasonable.

@Barry-Xu-2018
Copy link
Collaborator

@MiguelCompany @fujitatomoya

I created an issue under https://github.com/ros2/rmw_fastrtps ?
ros2/rmw_fastrtps#562

@iuhilnehc-ynos
Copy link
Collaborator

@eboasson

Thanks for your detailed explantion.

Deleting those two lines should fix it.

https://github.com/ros2/rmw_cyclonedds/blob/2aa2cf527d079265195b2dca91bb9c82481be0c1/rmw_cyclonedds_cpp/src/rmw_node.cpp#L4309-L4312

I think you may have realized that the dds_reset_qos(qos); line should also be deleted.

@MiguelCompany

the listener takes the request and keeps it internally on the callback here

Thanks for locating the source code.

is not respecting the depth setting

I think that the history kind(KEEP_LAST_HISTORY_QOS) should be considered together with depth.
Besides service callback, we need to update client callback, something like the following after std::lock_guard<std::mutex> lock(internalMutex_);,

          const eprosima::fastrtps::HistoryQosPolicy & history = reader->get_qos().history();
          if (eprosima::fastrtps::KEEP_LAST_HISTORY_QOS == history.kind &&
            list.size() >= static_cast<size_t>(history.depth))
          {
            list.pop_front();
          }

I'd like to hear if you have a further suggestion?

@eboasson
Copy link
Contributor

I think you may have realized that the dds_reset_qos(qos); line should also be deleted.

You are absolutely correct that that also needs to be removed, but I am afraid you may be giving me too much credit, at least this time: I had only such a fleeting look at it — essentially during a context switch from one of many things I had to attend to, to another of those many things — that I overlooked it ... 😳

Well, good thing I didn't attempt to also do a pull request 😀

@MiguelCompany
Copy link
Contributor

something like the following after std::lock_guard<std::mutex> lock(internalMutex_);,

          const eprosima::fastrtps::HistoryQosPolicy & history = reader->get_qos().history();
          if (eprosima::fastrtps::KEEP_LAST_HISTORY_QOS == history.kind &&
            list.size() >= static_cast<size_t>(history.depth))
          {
            list.pop_front();
          }

You are correct @iuhilnehc-ynos. Having that code on both service and client callbacks would be enough, I think.

I'd like to hear if you have a further suggestion?

It may be a good idea to have different default profiles for services and clients. There is currently only this one, which may be okay for clients, but presumably not suitable for services (which should maybe have KEEP_ALL history). What do you think @eboasson?

@fujitatomoya
Copy link
Collaborator

@Barry-Xu-2018 @iuhilnehc-ynos we can just go ahead to create PRs for rmw_cyclonedds and rmw_fastrtps? (@eboasson @MiguelCompany please help review once they are ready when you have time 😄 ) btw, probably we also need to confirm this issue with rmw_connextdds which i did not try yet. (CC: @asorbini )

@asorbini
Copy link
Contributor

@fujitatomoya thank you for the ping. AFAICT this shouldn't be a problem with rmw_connextdds since the RMW layer relies on the underlying's DataReader cache and doesn't do any caching.

In fact, I tried the reproducer provided by @mauropasse and things worked as expected (i.e. only the last request sent by the client gets processed):

$ RMW_IMPLEMENTATION=rmw_connextdds install/cs_test/bin/cs_test
Setting OoS depth = 1 for client and service
Client: async_send_request number: 0
Client: async_send_request number: 1
Client: async_send_request number: 2
Client: async_send_request number: 3
Client: async_send_request number: 4
Client: async_send_request number: 5
Client: async_send_request number: 6
Client: async_send_request number: 7
Client: async_send_request number: 8
Client: async_send_request number: 9
Client: async_send_request number: 10
Client: async_send_request number: 11
Client: async_send_request number: 12
Client: async_send_request number: 13
Client: async_send_request number: 14
Client: async_send_request number: 15
Press key to spin the Server
a
Press key to spin the Client
service_executor->spin()
[INFO] [1633027743.048037298] [service]: Handle service. Request: 15 + 15 = 30
a
client_executor->spin()
[INFO] [1633027744.552063149] [client]: Client callback. Result of 15 + 15 = 30
a
rclcpp::shutdown()

@fujitatomoya
Copy link
Collaborator

@asorbini that is great! thanks for checking!

@Barry-Xu-2018
Copy link
Collaborator

@iuhilnehc-ynos Thank you

@fujitatomoya
Copy link
Collaborator

we still have ros2/rmw_cyclonedds#340 open.

btw, i am not positive to backport this (it can be backported though) cz it changes default behavior. i would not think anyone would notice this default behavior but we would want to respect the stable behavior for distribution.

@fujitatomoya
Copy link
Collaborator

i will go ahead to close this, please re-open if anything is up.

@mauropasse
Copy link
Collaborator Author

Awesome, all working as expected now. Thank you guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants