-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not crash Executor when send_response fails due to client failure. #2215
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kghost thanks for creating PR on this issue.
@kghost thank you for being patient on this. I will be checking the following and then start CI all together.
|
see ros2/rclpy#1136 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs the one change inline, and also likely needs to be rebased. Once those are done, I'm happy to approve this.
I am experiencing the same kind of crash using the fastrtps RMW implementation with ros2 humble. Am I missing somthing? or are there any other patches required for it to work? |
Yes, you also need ros2/rcl#1048 |
@kghost are you willing to work on this? see #2215 (review) |
no problem! |
@kghost thank you very much for the effort, can you address DCO error? see https://github.com/ros2/rclcpp/pull/2215/checks?check_run_id=14906339355 |
@kghost this also need to be rebased on rolling, see https://ci.ros2.org/job/ci_linux/19079/console CI failed on building |
@fujitatomoya Yes, it is already rebased on rolling branch. |
Related to ros2/ros2#1253 It is not sane that a faulty client can crash our service Executor, as discussed in the referred issue, if the client is not setup properly, send_response may return RCL_RET_TIMEOUT, we should not throw an error in this case. Signed-off-by: Zang MingJie <zealot0630@gmail.com>
Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: Zang MingJie <zealot0630@gmail.com>
@clalancette This is ready to merge. |
@@ -486,6 +486,14 @@ class Service | |||
{ | |||
rcl_ret_t ret = rcl_send_response(get_service_handle().get(), &req_id, &response); | |||
|
|||
if (ret == RCL_RET_TIMEOUT) { | |||
RCLCPP_WARN( | |||
rclcpp::get_node_logger(node_handle_.get()).get_child("rclcpp"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not using node_logger_
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that there is access to the node logger in this class right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a protected member of ServiceBase
rclcpp/rclcpp/include/rclcpp/service.hpp
Line 278 in 4d1f18c
rclcpp::Logger node_logger_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good call. In that case, yeah, we should use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kghost can you address this one? and then i think we are good to go.
if (ret == RCL_RET_TIMEOUT) { | ||
RCLCPP_WARN( | ||
rclcpp::get_node_logger(node_handle_.get()).get_child("rclcpp"), | ||
"failed to send response (timeout): %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to add the service name in the log message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem with trying to do that here is that not all of the constructors have access to the service name. So this may not be possible to do without changing all of the constructors, or exposing service_name
in rcl_service_t
. Personally, I think this is nice but I wouldn't hold up this PR for that.
Should changes in rclcpp/rclcpp_action/src/server.cpp Line 340 in c0d72c3
rclcpp/rclcpp_action/src/server.cpp Line 482 in c0d72c3
rclcpp/rclcpp_action/src/server.cpp Line 539 in c0d72c3
rclcpp/rclcpp_action/src/server.cpp Line 672 in c0d72c3
Related to this topic, we're in conversations to set the timeout in DDS from an XML, so we can increase the timeout before returning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that @mauropasse makes a good point and we should do the same thing for the action side of things as well. Once that is fixed I'm happy to approve this PR.
if (ret == RCL_RET_TIMEOUT) { | ||
RCLCPP_WARN( | ||
rclcpp::get_node_logger(node_handle_.get()).get_child("rclcpp"), | ||
"failed to send response (timeout): %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem with trying to do that here is that not all of the constructors have access to the service name. So this may not be possible to do without changing all of the constructors, or exposing service_name
in rcl_service_t
. Personally, I think this is nice but I wouldn't hold up this PR for that.
@@ -486,6 +486,14 @@ class Service | |||
{ | |||
rcl_ret_t ret = rcl_send_response(get_service_handle().get(), &req_id, &response); | |||
|
|||
if (ret == RCL_RET_TIMEOUT) { | |||
RCLCPP_WARN( | |||
rclcpp::get_node_logger(node_handle_.get()).get_child("rclcpp"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that there is access to the node logger in this class right now.
@clalancette @mauropasse can we manage this with another issue before adding the implementation in this PR? i would like to check if that happens in detail. |
|
@clalancette @mauropasse friendly ping. |
What do you mean with "that happens in detail"? I know it happens with actions since that's where we observed this issue first, and come with the temporary solution of increasing the timeout. With regards of applying the action side of these fixes in a different PR, I'm OK with that. Maybe adding tests for these situations also. |
@mauropasse i was not sure if this problem can be actually reproducible with actions, thanks for the information.
👍 @clalancette what do you think? |
Yeah, that's fine. Let's keep this one just about the service server, and we can do the follow-up with ros2/ros2#1462 for action servers. I'll go ahead and re-review this. |
@clalancette @kghost i will go ahead to close this, instead can we proceed with #2276? |
Related to ros2/ros2#1253
It is not sane that a faulty client can crash our service Executor, as discussed in the referred issue, if the client is not setup properly, send_response may return RCL_RET_TIMEOUT, we should not throw an error in this case.