Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Executing action client but nothing is ready" exception thrown when client sends goal #1399

Closed
schornakj opened this issue Oct 9, 2020 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@schornakj
Copy link

Bug report

Required Info:

  • Operating System:
    • Ubuntu 18.04 LTS
  • Installation type:
    • Binaries
  • Version or commit hash:
    • Eloquent
  • DDS implementation:
    • CycloneDDS
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

I will work on creating a portable minimal environment that can reproduce this issue, since my application is somewhat complex.

I have three nodes;

  • Node A contains an action server for a custom action type (for example, MyAction) advertised on /my_action. This node uses a multi-threaded executor. The action server is assigned to a MutuallyExclusive callback group.
  • Node B contains a client for the same MyAction action on /my_action. This node uses the default single-threaded executor.
  • Node C also contains a MyAction client also pointed at /my_action. This node uses a multi-threaded executor. The action client is assigned to a Reentrant callback group.

Node B and Node C send action goals using async_send_goal. The nodes do not send goals concurrently.

Expected behavior

Normal client/server action behavior where each client sends a goal and receives a result from the server.

Actual behavior

Node C throws an exception and crashes after either Node B or Node C call async_send_goal:

terminate called after throwing an instance of `std::runtime_error`
  what(): Executing action client but nothing is ready

Sometimes the error includes terminate called recursively.

Additional information

There is some discussion of this exception in #594, though a conclusive answer wasn't reached (since I expect that more info about the implementation of the nodes is needed).

@fujitatomoya
Copy link
Collaborator

I would check the following,

  • use Node C with SingleThreadedExecutor to see if this problem still can be observed. (expecting no)
  • use Node C with MutuallyExclusive callback group to see if this problem still can be observed. (expecting no)

and if reproducible test program is available, that would be really helpful.

@Serafadam
Copy link

Serafadam commented Oct 12, 2020

Hi, I've been having similar issues with an action server. The setup is pretty much the same as with node C (Multithreaded executor & Reentrant CB group). Is this issue caused by this configuration? As for reproducing the results, it would be a little hard, since it involves hardware. My server works most of the time, but the exception occurs randomly, so it's kinda hard to debug properly. The actions are being sent every several seconds.

@schornakj
Copy link
Author

I would check the following,

* use Node C with SingleThreadedExecutor to see if this problem still can be observed. (expecting no)

* use Node C with MutuallyExclusive callback group to see if this problem still can be observed. (expecting no)

and if reproducible test program is available, that would be really helpful.

I changed Node C to use a MutuallyExclusive callback group and the crash I was seeing no longer occurred. I wasn't able to use the SingleThreadedExecutor but I can represent that in a standalone test case.

@schornakj
Copy link
Author

schornakj commented Oct 12, 2020

I created a standalone package that reproduces this error: https://github.com/schornakj/rclcpp_action_exception_test

Interestingly, I was able to get the node with the action client to crash without having a second client node (in terms of my previous example, I only have Node A and Node C). It looks like this may be caused specifically by assigning an action client to a Reentrant callback group.

Also, since I use wait on the goal response and result futures to get the result of the action, the SingleThreadedExecutor will not work for this node.

I'll include the code from my failing client node below:

client_node_multithread_reentrant.h

#pragma once

#include <rclcpp/rclcpp.hpp>
#include <rclcpp_action/rclcpp_action.hpp>
#include <example_interfaces/action/fibonacci.hpp>

class ClientNodeMultithreadReentrant : public rclcpp::Node
{
public:
  using Fibonacci = example_interfaces::action::Fibonacci;
  using ClientGoalHandleFibonacci = rclcpp_action::ClientGoalHandle<Fibonacci>;

  explicit ClientNodeMultithreadReentrant();

private:
  void sendGoal();

  rclcpp::callback_group::CallbackGroup::SharedPtr cb_group_reentrant_;
  rclcpp_action::Client<Fibonacci>::SharedPtr client_;
  rclcpp::TimerBase::SharedPtr timer_;

};

client_node_multithread_reentrant.cpp

#include <action_client_test/client_node_multithread_reentrant.h>

ClientNodeMultithreadReentrant::ClientNodeMultithreadReentrant()
  : Node("action_test_client_node")
  , cb_group_reentrant_(create_callback_group(rclcpp::callback_group::CallbackGroupType::Reentrant))
  , client_(rclcpp_action::create_client<Fibonacci>(this, "/fibonacci", cb_group_reentrant_))
  , timer_(create_wall_timer(std::chrono::duration<double>(1.0), std::bind(&ClientNodeMultithreadReentrant::sendGoal, this)))
{
}

void ClientNodeMultithreadReentrant::sendGoal()
{
  RCLCPP_INFO(get_logger(), "Sending goal");

  Fibonacci::Goal goal;
  goal.order = 10;

  rclcpp_action::Client<Fibonacci>::SendGoalOptions options;
  options.goal_response_callback = [](std::shared_future<ClientGoalHandleFibonacci::SharedPtr> future) { (void) future; };
  options.result_callback = [](const ClientGoalHandleFibonacci::WrappedResult& result) { (void) result; };

  auto gh_future = client_->async_send_goal(goal, options);
  gh_future.wait();

  auto async_res_future = gh_future.get()->async_result();
  async_res_future.wait();

  auto result = async_res_future.get();

  if (result.code != rclcpp_action::ResultCode::SUCCEEDED)
  {
    RCLCPP_WARN(get_logger(), "Action did not succeed");
    return;
  }

  std::stringstream stream;
  for (auto& seq : result.result->sequence)
  {
    stream << seq << " ";
  }

  RCLCPP_INFO_STREAM(get_logger(), "Action succeeded! " << stream.str());
}

int main(int argc, char* argv[])
{
  rclcpp::init(argc, argv);
  auto node = std::make_shared<ClientNodeMultithreadReentrant>();
  rclcpp::executors::MultiThreadedExecutor executor;
  executor.add_node(node);
  executor.spin();
  executor.remove_node(node);
  rclcpp::shutdown();
}

@ivanpauno ivanpauno added the bug Something isn't working label Oct 13, 2020
@ivanpauno
Copy link
Member

I think this is the same kind of issue that #1241 was trying to solve, I will try to take a closer look again but I'm not sure what's the right fix.

@ivanpauno ivanpauno self-assigned this Oct 13, 2020
@fujitatomoya
Copy link
Collaborator

as @ivanpauno mentioned, this is similar problem with #1212, but this case is about ClientBase (inherited from rclcpp::Waitable). I believe that problematic sequence is the following,

in MultiThreadedExecutor

  1. Thread-1 gets the executable(action client waitable), and the release the wait_mutex_.
  2. Thread-2 re-collect the entities, then gets the same executable(action client waitable) before Thread-1 executes it.
  3. Thread-1 executes the executable.
  4. Thread-2 tries to execute the executable waitset, but it is already done. then eventually throws exception.

@fujitatomoya
Copy link
Collaborator

Just FYI,

https://github.com/schornakj/rclcpp_action_exception_test is for eloquent, and cannot run with master ros2/ros2@d2f6b73 since rclcpp_action interface and launch description is different. https://github.com/fujitatomoya/rclcpp_action_exception_test can be built with master.

@fujitatomoya
Copy link
Collaborator

@Serafadam @schornakj

It would be nice if you could check if #1409 fixes the problem.

@Serafadam
Copy link

My project is currently built on dashing, it will take some time before I update it, unfortunately

@schornakj
Copy link
Author

@fujitatomoya I'm not set up to test code using the ROS2 master branch right now, so it would also take me some time before I could try out your PR. Conceptually I think it looks good!

@fujitatomoya
Copy link
Collaborator

I think #1241 should fix this, i will confirm that.

@fujitatomoya
Copy link
Collaborator

@ivanpauno

I confirmed #1241 can fix this issue 👍
I used master ros2/ros2@4aeefc1 and reproducible test. In 15 mins, there was no exception happens.

@fujitatomoya
Copy link
Collaborator

@ivanpauno

i think we can close this since #1241 has been merged. I did confirm #1241 can fix this issue. (see #1399 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants