Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added WARN for action server goal response callback in case of timeout instead of raising exception #2463

Conversation

alireza-moayyedi
Copy link

Hello @fujitatomoya,

Following your reply in ros2/rmw_fastrtps#751 and reading the discussions in #2215 I took the initiative to take a look at the problem at hand and see if I could fix it. Now just a few notes:

  • Before anything, I want you to know that this was my first time diving deep into the ROS2 source code and trying to change stuff. I also have limited time available to work on it. So I ask apologies in advance if what I am doing here is wrong.
  • Looking at Do not crash Executor when send_response fails due to client failure. #2215 (comment) I assume that the same solution should be applied for all responses i.e., goal response, cancel response and result response. As the first commit I only changed the goal response to make sure it is the right way to do it. Given your confirmation I will extend this PR to implement the same for other responses as well.
  • This would fix the problem for RCLCPP but I believe the same fix should also be applied to rclpy (https://github.com/ros2/rclpy/blob/rolling/rclpy/src/rclpy/action_server.cpp#L138). I also did something similar for that as the following:
  if (RCL_RET_TIMEOUT == ret) { \
    int stack_level = 1; \
    PyErr_WarnFormat( \
      PyExc_RuntimeWarning, stack_level, "Failed to send " #Type " response (timeout): %s", \
      rcl_get_error_string().str); \
    rcl_reset_error(); \
  } else if (RCL_RET_OK != ret) { \
    throw rclpy::RCLError("Failed to send " #Type " response"); \
  }

Of course this needs its own PR in the approriate rclpy repo but since they are related I want to first make sure I am on the right path hence bringing it up here.

  • Something is still unclear to me. With such changes, we avoid raising exceptions on responses. But I am wondering if that would fix my original problem (Python action server failing in Bitbucket CI when receiving request from C++ client rmw_fastrtps#751). If there is a timeout on a goal response, can my action simply continuing the action execution? I mean, would it be able to send the result at a later stage to my client? Or is it like once the client is lost, it would be lost forever (I know for sure that my nodes are running fine in any case)? Unfortuantely to properly test this in my Bitbucket pipelines I need to spend considerable time to adjust my dockers and I did not have enough resources available for that.

Signed-off-by: Alireza Moayyedi <alireza.moayyedi@nobleo.nl>
@fujitatomoya
Copy link
Collaborator

sorry i did not allocate time for the more details need to be fixed. this is part of it, but i think we need to address more entities for action services.

besides, ros2/rmw_fastrtps#751 originally comes from rclpy, so we need to the same fix for rclpy and rclcpp.

i think i can come up with PRs in this weekend, thanks for pushing this forward.

@alireza-moayyedi
Copy link
Author

alireza-moayyedi commented Mar 27, 2024

@fujitatomoya given your following merged PRs:

I'm going to close this one. Thanks a lot for taking care of it. Also thanks to @alsora for the reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants