Throwing exception while creating a service or a subscription on request can cause clients to wait forever #1581

squizz617 · 2021-03-16T23:37:54Z

Bug report

Required Info:

Operating System: Ubuntu 18.04.5
Installation type: From source
Version or commit hash: Dashing (e8cf066d)
DDS implementation: Fast-RTPS
Client library (if applicable): Both rclcpp and rclpy

Steps to reproduce issue

Run turtlesim_node

$ ros2 run turtlesim turtlesim_node

Send request to topic /spawn with any invalid name (e.g., 256 bytes of "A"s)

$ ros2 service call /spawn turtlesim/srv/Spawn "{x: 1.0, y: 1.0, theta: 0.0, name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA}"

The requester (/_ros2cli_requester_turtlesim_Spawn in this case) waits forever as turtlesim_node is terminated throwing rclcpp::exceptions::InvalidTopicNameError.

Expected behavior

Rather than just terminating, Node::create_subscription() and Node::create_service() should handle such exception and return something (e.g., a NULL pointer) so that the caller can properly handle the error.

Actual behavior

In ros2/rclcpp/rclcpp/src/rclcpp/expand_topic_or_service_name.cpp, if the validation of the expanded service name fails, rclcpp::exceptions::InvalidServiceNameError is thrown, terminating the node that tried to create a service. As a result, the requester keeps waiting for the response to its spawn request.

Additional information

I've taken an example of turtlesim for its simplicity, and aware that turtlesim itself could try-catch an exception. However, I suggest rcl handles this issue as a middleware for the following reasons:
(1) as far as I know, this behavior is not documented anywhere,
(2) none of the code included in the ros2 repositories (https://raw.githubusercontent.com/ros2/ros2/dashing/ros2.repos) try-catch those exceptions when creating subscriptions or services,
(3) merely remapping any topic to an invalid name kills the node, leaving chances to be maliciously used by attackers, and
(4) there can be many other systems that are already being affected.

The text was updated successfully, but these errors were encountered:

clalancette · 2021-03-17T12:51:02Z

Rather than just terminating, Node::create_subscription() and Node::create_service() should handle such exception and return something (e.g., a NULL pointer) so that the caller can properly handle the error.

Part of the problem is we have no way to return an error from a service. So if the service call fails, we just have no signaling mechanism to return to the client.

I've taken an example of turtlesim for its simplicity, and aware that turtlesim itself could try-catch an exception. However, I suggest rcl handles this issue as a middleware for the following reasons:

I agree that this is a problem, but I'm really not sure that handling this should be the responsibility of the rcl layer. Doing it there means that none of the higher level code gets a chance to handle it if they want to. We could consider handling it at the rclcpp/rclpy layer, but that has a similar problem; the user code doesn't get a chance to handle it. Further, it isn't totally clear to me what the response of rclcpp should be in this case. Right now, rclcpp throws the exception and leaves it to the application code to deal with it.

But I honestly haven't thought about it in any great depth. @ros2/team, any thoughts here?

wjwwood · 2021-03-17T23:36:21Z

We don't have the ability for a service server to signal an error and cause the client to fail, but we do have a timeout option for the client. It should give up after a while. If you have control over the service you could have an optional field to indicate an error occurred have the service server return that being set if it fails rather than crashing and causing the client to wait until its timeout or hang forever.

I do not think rcl should handle this.

Expected behavior

Rather than just terminating, Node::create_subscription() and Node::create_service() should handle such exception and return something (e.g., a NULL pointer) so that the caller can properly handle the error.

This doesn't solve anything AFAICT, because you could just catch the exception if you wanted to "properly handle the error". Returning nullptr is just another way of signaling an error. If the code didn't check for nullptr you'd just get a segmentation fault rather than an unhandled exception as you do now...

as far as I know, this behavior is not documented anywhere

It could be better (especially the rclcpp docs), but it is documented that creating a pub/sub/service can fail if the topic name is invalid, e.g.:

rclcpp/rclcpp/include/rclcpp/expand_topic_or_service_name.hpp

Lines 25 to 61 in 35c89c8

    
           /// Expand a topic or service name and throw if it is not valid. 
        
           /** 
        
            * This function can be used to "just" validate a topic or service name too, 
        
            * since expanding the topic name is required to fully validate a name. 
        
            * 
        
            * If the name is invalid, then InvalidTopicNameError is thrown or 
        
            * InvalidServiceNameError if is_service is true. 
        
            * 
        
            * This function can take any form of a topic or service name, i.e. it does not 
        
            * have to be a fully qualified name. 
        
            * The node name and namespace are used to expand it if necessary while 
        
            * validating it. 
        
            * 
        
            * Expansion is done with rcl_expand_topic_name. 
        
            * The validation is doen with rcl_validate_topic_name and 
        
            * rmw_validate_full_topic_name, so details about failures can be found in the 
        
            * documentation for those functions. 
        
            * 
        
            * \param name the topic or service name to be validated 
        
            * \param node_name the name of the node associated with the name 
        
            * \param namespace_ the namespace of the node associated with the name 
        
            * \param is_service if true InvalidServiceNameError is thrown instead 
        
            * \returns expanded (and validated) topic name 
        
            * \throws InvalidTopicNameError if name is invalid and is_service is false 
        
            * \throws InvalidServiceNameError if name is invalid and is_service is true 
        
            * \throws std::bad_alloc if memory cannot be allocated 
        
            * \throws RCLError if an unexpect error occurs 
        
            * \throws std::runtime_error if the topic name is unexpectedly valid or, 
        
            *    if the rcl name is invalid or if the rcl namespace is invalid 
        
            */ 
        
           RCLCPP_PUBLIC 
        
           std::string 
        
           expand_topic_or_service_name( 
        
             const std::string & name, 
        
             const std::string & node_name, 
        
             const std::string & namespace_, 
        
             bool is_service = false);

https://github.com/ros2/rcl/blob/fc3f93bd792ae467d26d6581da54158328dcd111/rcl/include/rcl/publisher.h#L147

https://design.ros2.org/articles/topic_and_service_names.html#ros-2-topic-and-service-name-constraints

none of the code included in the ros2 repositories (https://raw.githubusercontent.com/ros2/ros2/dashing/ros2.repos) try-catch those exceptions when creating subscriptions or services,

Perhaps those could be updated, but honestly what else would it do if the given topic name (remapped or not) is invalid? There's no obvious mitigation strategy in most cases other than to exit the process.

merely remapping any topic to an invalid name kills the node, leaving chances to be maliciously used by attackers, and

You can cause an application to crash by giving it an invalid topic name remapping, but I fail to see how that's different than any other parameter an application can have. If you're worried about security you should probably limit or disable remapping, along with many, many other things to make the system tamper resistant.

there can be many other systems that are already being affected

Again, there's no reasonable action that we could take unilaterally that would work for everyone out there, at least that I am aware of.

mjeronimo · 2021-03-23T15:40:14Z

In general, I would say that it's the responsibility of the middleware to communicate the messages and the responsibility of the application to respond to application-level exceptions. In this case though, the middleware is throwing an exception to enforce a policy regarding the topic name length, but it's doing it on the receiving side where there is nothing that the receiving node can do about it and no way to communicate back the error. How about doing this check on the sending side instead (or in addition) where it can verify the length before sending the request? The validating code on the client side would then be in a position to return an error code to the code calling the service which is the source of the error.

clalancette · 2021-03-23T18:04:18Z

How about doing this check on the sending side instead (or in addition) where it can verify the length before sending the request? The validating code on the client side would then be in a position to return an error code to the code calling the service which is the source of the error.

I think that makes sense as long as we have the validation on both sides. In the nominal case where users are using our libraries, then they'll get nice error messages on the client side. In the more unusual case of someone integrating with ROS 2, but not using our libraries, we'll be defended from undefined behavior on the server side with the additional checks there.

wjwwood · 2021-03-23T18:27:47Z

In the example given, I do not think there is an opportunity to do the validation on the sending side. It is being sent with ros2 service call, which has no knowledge of the semantic meaning for the fields of the request. It's only on the receiving side that this string is interpreted as a topic name.

So while "check on the sending side first", is a fine suggestion in general. I don't think it addresses this issue.

mjeronimo · 2021-03-23T18:31:39Z

That's true, but I was thinking that using the CLI was just an easy way to demonstrate the issue. I'll leave it to @squizz617 to confirm or not.

squizz617 · 2021-03-23T19:39:14Z

Thanks for having a productive discussion regarding the issue!
You're correct @mjeronimo. I created this issue with an oversimplified example just for the sake of simplicity. I've actually built a testing framework for ROS-powered robotic systems, and running the prototype implementation revealed similar issues in some of the tested systems that do not handle exception on the server side. Enforcing a validation on the client side does sound like a good design choice, as it would not only prevent such issues from happening, but also notify the client about the potential runtime issues (e.g., if it dynamically assigns topic/service names) with a return code.

clalancette added the backlog label Feb 3, 2022

afrixs mentioned this issue Mar 2, 2022

Migration guide recommends to throw exceptions in service callbacks ros2/ros2_documentation#2341

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throwing exception while creating a service or a subscription on request can cause clients to wait forever #1581

Throwing exception while creating a service or a subscription on request can cause clients to wait forever #1581

squizz617 commented Mar 16, 2021

clalancette commented Mar 17, 2021 •

edited

Loading

wjwwood commented Mar 17, 2021 •

edited

Loading

mjeronimo commented Mar 23, 2021

clalancette commented Mar 23, 2021

wjwwood commented Mar 23, 2021 •

edited

Loading

mjeronimo commented Mar 23, 2021

squizz617 commented Mar 23, 2021 •

edited

Loading

Throwing exception while creating a service or a subscription on request can cause clients to wait forever #1581

Throwing exception while creating a service or a subscription on request can cause clients to wait forever #1581

Comments

squizz617 commented Mar 16, 2021

Bug report

Steps to reproduce issue

Expected behavior

Actual behavior

Additional information

clalancette commented Mar 17, 2021 • edited Loading

wjwwood commented Mar 17, 2021 • edited Loading

mjeronimo commented Mar 23, 2021

clalancette commented Mar 23, 2021

wjwwood commented Mar 23, 2021 • edited Loading

mjeronimo commented Mar 23, 2021

squizz617 commented Mar 23, 2021 • edited Loading

clalancette commented Mar 17, 2021 •

edited

Loading

wjwwood commented Mar 17, 2021 •

edited

Loading

wjwwood commented Mar 23, 2021 •

edited

Loading

squizz617 commented Mar 23, 2021 •

edited

Loading