-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throwing exception while creating a service or a subscription on request can cause clients to wait forever #1581
Comments
Part of the problem is we have no way to return an error from a service. So if the service call fails, we just have no signaling mechanism to return to the client.
I agree that this is a problem, but I'm really not sure that handling this should be the responsibility of the But I honestly haven't thought about it in any great depth. @ros2/team, any thoughts here? |
We don't have the ability for a service server to signal an error and cause the client to fail, but we do have a timeout option for the client. It should give up after a while. If you have control over the service you could have an optional field to indicate an error occurred have the service server return that being set if it fails rather than crashing and causing the client to wait until its timeout or hang forever. I do not think
This doesn't solve anything AFAICT, because you could just catch the exception if you wanted to "properly handle the error". Returning nullptr is just another way of signaling an error. If the code didn't check for nullptr you'd just get a segmentation fault rather than an unhandled exception as you do now...
It could be better (especially the rclcpp docs), but it is documented that creating a pub/sub/service can fail if the topic name is invalid, e.g.: rclcpp/rclcpp/include/rclcpp/expand_topic_or_service_name.hpp Lines 25 to 61 in 35c89c8
Perhaps those could be updated, but honestly what else would it do if the given topic name (remapped or not) is invalid? There's no obvious mitigation strategy in most cases other than to exit the process.
You can cause an application to crash by giving it an invalid topic name remapping, but I fail to see how that's different than any other parameter an application can have. If you're worried about security you should probably limit or disable remapping, along with many, many other things to make the system tamper resistant.
Again, there's no reasonable action that we could take unilaterally that would work for everyone out there, at least that I am aware of. |
In general, I would say that it's the responsibility of the middleware to communicate the messages and the responsibility of the application to respond to application-level exceptions. In this case though, the middleware is throwing an exception to enforce a policy regarding the topic name length, but it's doing it on the receiving side where there is nothing that the receiving node can do about it and no way to communicate back the error. How about doing this check on the sending side instead (or in addition) where it can verify the length before sending the request? The validating code on the client side would then be in a position to return an error code to the code calling the service which is the source of the error. |
I think that makes sense as long as we have the validation on both sides. In the nominal case where users are using our libraries, then they'll get nice error messages on the client side. In the more unusual case of someone integrating with ROS 2, but not using our libraries, we'll be defended from undefined behavior on the server side with the additional checks there. |
In the example given, I do not think there is an opportunity to do the validation on the sending side. It is being sent with So while "check on the sending side first", is a fine suggestion in general. I don't think it addresses this issue. |
That's true, but I was thinking that using the CLI was just an easy way to demonstrate the issue. I'll leave it to @squizz617 to confirm or not. |
Thanks for having a productive discussion regarding the issue! |
Bug report
Required Info:
e8cf066d
)Steps to reproduce issue
turtlesim_node
/spawn
with any invalid name (e.g., 256 bytes of "A"s)/_ros2cli_requester_turtlesim_Spawn
in this case) waits forever asturtlesim_node
is terminated throwingrclcpp::exceptions::InvalidTopicNameError
.Expected behavior
Rather than just terminating,
Node::create_subscription()
andNode::create_service()
should handle such exception and return something (e.g., a NULL pointer) so that the caller can properly handle the error.Actual behavior
In
ros2/rclcpp/rclcpp/src/rclcpp/expand_topic_or_service_name.cpp
, if the validation of the expanded service name fails,rclcpp::exceptions::InvalidServiceNameError
is thrown, terminating the node that tried to create a service. As a result, the requester keeps waiting for the response to its spawn request.Additional information
I've taken an example of
turtlesim
for its simplicity, and aware thatturtlesim
itself could try-catch an exception. However, I suggest rcl handles this issue as a middleware for the following reasons:(1) as far as I know, this behavior is not documented anywhere,
(2) none of the code included in the ros2 repositories (
https://raw.githubusercontent.com/ros2/ros2/dashing/ros2.repos
) try-catch those exceptions when creating subscriptions or services,(3) merely remapping any topic to an invalid name kills the node, leaving chances to be maliciously used by attackers, and
(4) there can be many other systems that are already being affected.
The text was updated successfully, but these errors were encountered: