Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throwing exception while creating a service or a subscription on request can cause clients to wait forever #1581

Open
squizz617 opened this issue Mar 16, 2021 · 7 comments
Labels

Comments

@squizz617
Copy link

Bug report

Required Info:

  • Operating System: Ubuntu 18.04.5
  • Installation type: From source
  • Version or commit hash: Dashing (e8cf066d)
  • DDS implementation: Fast-RTPS
  • Client library (if applicable): Both rclcpp and rclpy

Steps to reproduce issue

  1. Run turtlesim_node
$ ros2 run turtlesim turtlesim_node
  1. Send request to topic /spawn with any invalid name (e.g., 256 bytes of "A"s)
$ ros2 service call /spawn turtlesim/srv/Spawn "{x: 1.0, y: 1.0, theta: 0.0, name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA}"
  1. The requester (/_ros2cli_requester_turtlesim_Spawn in this case) waits forever as turtlesim_node is terminated throwing rclcpp::exceptions::InvalidTopicNameError.

Expected behavior

Rather than just terminating, Node::create_subscription() and Node::create_service() should handle such exception and return something (e.g., a NULL pointer) so that the caller can properly handle the error.

Actual behavior

In ros2/rclcpp/rclcpp/src/rclcpp/expand_topic_or_service_name.cpp, if the validation of the expanded service name fails, rclcpp::exceptions::InvalidServiceNameError is thrown, terminating the node that tried to create a service. As a result, the requester keeps waiting for the response to its spawn request.

Additional information

I've taken an example of turtlesim for its simplicity, and aware that turtlesim itself could try-catch an exception. However, I suggest rcl handles this issue as a middleware for the following reasons:
(1) as far as I know, this behavior is not documented anywhere,
(2) none of the code included in the ros2 repositories (https://raw.githubusercontent.com/ros2/ros2/dashing/ros2.repos) try-catch those exceptions when creating subscriptions or services,
(3) merely remapping any topic to an invalid name kills the node, leaving chances to be maliciously used by attackers, and
(4) there can be many other systems that are already being affected.

@clalancette
Copy link
Contributor

clalancette commented Mar 17, 2021

Rather than just terminating, Node::create_subscription() and Node::create_service() should handle such exception and return something (e.g., a NULL pointer) so that the caller can properly handle the error.

Part of the problem is we have no way to return an error from a service. So if the service call fails, we just have no signaling mechanism to return to the client.

I've taken an example of turtlesim for its simplicity, and aware that turtlesim itself could try-catch an exception. However, I suggest rcl handles this issue as a middleware for the following reasons:

I agree that this is a problem, but I'm really not sure that handling this should be the responsibility of the rcl layer. Doing it there means that none of the higher level code gets a chance to handle it if they want to. We could consider handling it at the rclcpp/rclpy layer, but that has a similar problem; the user code doesn't get a chance to handle it. Further, it isn't totally clear to me what the response of rclcpp should be in this case. Right now, rclcpp throws the exception and leaves it to the application code to deal with it.

But I honestly haven't thought about it in any great depth. @ros2/team, any thoughts here?

@wjwwood
Copy link
Member

wjwwood commented Mar 17, 2021

We don't have the ability for a service server to signal an error and cause the client to fail, but we do have a timeout option for the client. It should give up after a while. If you have control over the service you could have an optional field to indicate an error occurred have the service server return that being set if it fails rather than crashing and causing the client to wait until its timeout or hang forever.

I do not think rcl should handle this.

Expected behavior

Rather than just terminating, Node::create_subscription() and Node::create_service() should handle such exception and return something (e.g., a NULL pointer) so that the caller can properly handle the error.

This doesn't solve anything AFAICT, because you could just catch the exception if you wanted to "properly handle the error". Returning nullptr is just another way of signaling an error. If the code didn't check for nullptr you'd just get a segmentation fault rather than an unhandled exception as you do now...

as far as I know, this behavior is not documented anywhere

It could be better (especially the rclcpp docs), but it is documented that creating a pub/sub/service can fail if the topic name is invalid, e.g.:

/// Expand a topic or service name and throw if it is not valid.
/**
* This function can be used to "just" validate a topic or service name too,
* since expanding the topic name is required to fully validate a name.
*
* If the name is invalid, then InvalidTopicNameError is thrown or
* InvalidServiceNameError if is_service is true.
*
* This function can take any form of a topic or service name, i.e. it does not
* have to be a fully qualified name.
* The node name and namespace are used to expand it if necessary while
* validating it.
*
* Expansion is done with rcl_expand_topic_name.
* The validation is doen with rcl_validate_topic_name and
* rmw_validate_full_topic_name, so details about failures can be found in the
* documentation for those functions.
*
* \param name the topic or service name to be validated
* \param node_name the name of the node associated with the name
* \param namespace_ the namespace of the node associated with the name
* \param is_service if true InvalidServiceNameError is thrown instead
* \returns expanded (and validated) topic name
* \throws InvalidTopicNameError if name is invalid and is_service is false
* \throws InvalidServiceNameError if name is invalid and is_service is true
* \throws std::bad_alloc if memory cannot be allocated
* \throws RCLError if an unexpect error occurs
* \throws std::runtime_error if the topic name is unexpectedly valid or,
* if the rcl name is invalid or if the rcl namespace is invalid
*/
RCLCPP_PUBLIC
std::string
expand_topic_or_service_name(
const std::string & name,
const std::string & node_name,
const std::string & namespace_,
bool is_service = false);

https://github.com/ros2/rcl/blob/fc3f93bd792ae467d26d6581da54158328dcd111/rcl/include/rcl/publisher.h#L147

https://design.ros2.org/articles/topic_and_service_names.html#ros-2-topic-and-service-name-constraints

none of the code included in the ros2 repositories (https://raw.githubusercontent.com/ros2/ros2/dashing/ros2.repos) try-catch those exceptions when creating subscriptions or services,

Perhaps those could be updated, but honestly what else would it do if the given topic name (remapped or not) is invalid? There's no obvious mitigation strategy in most cases other than to exit the process.

merely remapping any topic to an invalid name kills the node, leaving chances to be maliciously used by attackers, and

You can cause an application to crash by giving it an invalid topic name remapping, but I fail to see how that's different than any other parameter an application can have. If you're worried about security you should probably limit or disable remapping, along with many, many other things to make the system tamper resistant.

there can be many other systems that are already being affected

Again, there's no reasonable action that we could take unilaterally that would work for everyone out there, at least that I am aware of.

@mjeronimo
Copy link

In general, I would say that it's the responsibility of the middleware to communicate the messages and the responsibility of the application to respond to application-level exceptions. In this case though, the middleware is throwing an exception to enforce a policy regarding the topic name length, but it's doing it on the receiving side where there is nothing that the receiving node can do about it and no way to communicate back the error. How about doing this check on the sending side instead (or in addition) where it can verify the length before sending the request? The validating code on the client side would then be in a position to return an error code to the code calling the service which is the source of the error.

@clalancette
Copy link
Contributor

How about doing this check on the sending side instead (or in addition) where it can verify the length before sending the request? The validating code on the client side would then be in a position to return an error code to the code calling the service which is the source of the error.

I think that makes sense as long as we have the validation on both sides. In the nominal case where users are using our libraries, then they'll get nice error messages on the client side. In the more unusual case of someone integrating with ROS 2, but not using our libraries, we'll be defended from undefined behavior on the server side with the additional checks there.

@wjwwood
Copy link
Member

wjwwood commented Mar 23, 2021

In the example given, I do not think there is an opportunity to do the validation on the sending side. It is being sent with ros2 service call, which has no knowledge of the semantic meaning for the fields of the request. It's only on the receiving side that this string is interpreted as a topic name.

So while "check on the sending side first", is a fine suggestion in general. I don't think it addresses this issue.

@mjeronimo
Copy link

That's true, but I was thinking that using the CLI was just an easy way to demonstrate the issue. I'll leave it to @squizz617 to confirm or not.

@squizz617
Copy link
Author

squizz617 commented Mar 23, 2021

Thanks for having a productive discussion regarding the issue!
You're correct @mjeronimo. I created this issue with an oversimplified example just for the sake of simplicity. I've actually built a testing framework for ROS-powered robotic systems, and running the prototype implementation revealed similar issues in some of the tested systems that do not handle exception on the server side. Enforcing a validation on the client side does sound like a good design choice, as it would not only prevent such issues from happening, but also notify the client about the potential runtime issues (e.g., if it dynamically assigns topic/service names) with a return code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants