Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tool to request transition to ErrorState #1758

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions nav2_util/include/nav2_util/lifecycle_utils.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,33 @@ void reset_lifecycle_nodes(
reset_lifecycle_nodes(split(nodes, ':'), service_call_timeout, retries);
}

/// Transition the given lifecycle nodes to the ErrorProcessing state in order
/** At this time, service calls frequently hang for unknown reasons. The only
* way to combat that is to timeout the service call and retry it. To use this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really curious, is this timeout issue still an issue? That was circa Bouncy when that message was added to this tool, we've come a long way since then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have no idea, I added it as well for consistency with the other tools. I can take a look at it and see if this was fixed somewhere. Or if you know how to test it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, lets just leave it. Its not hurting anyone. But I would be curious if in your unit tests you set the number of retries to 0 so that it fails and see if we ever catch it in your development / CI later. If it doesn't cause problems, we can remove it later.

* function, estimate how long your nodes should take to at each transition and
* set your timeout accordingly.
* \param[in] node_names A vector of the fully qualified node names to reset.
* \param[in] service_call_timeout The maximum amount of time to wait for a
* service call.
* \param[in] retries The number of times to try a state transition service call
*/
void process_error_lifecycle_nodes(
const std::vector<std::string> & node_names,
const std::chrono::seconds service_call_timeout = std::chrono::seconds::max(),
const int retries = 3);

/// Transition the given lifecycle nodes to the ErrorProcessing state in order.
/**
* \param[in] nodes A ':' seperated list of node names. eg. "/node1:/node2"
*/
void process_error_lifecycle_nodes(
const std::string & nodes,
const std::chrono::seconds service_call_timeout = std::chrono::seconds::max(),
const int retries = 3)
{
process_error_lifecycle_nodes(split(nodes, ':'), service_call_timeout, retries);
}

} // namespace nav2_util

#endif // NAV2_UTIL__LIFECYCLE_UTILS_HPP_
55 changes: 55 additions & 0 deletions nav2_util/src/lifecycle_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

using std::string;
using lifecycle_msgs::msg::Transition;
using lifecycle_msgs::msg::State;

namespace nav2_util
{
Expand Down Expand Up @@ -98,4 +99,58 @@ void reset_lifecycle_nodes(
}
}

// TODO(gimait): This should be updated with the missing transitions to error after
// PRs ros2/rcl_interfaces#97, ros2/rcl#618 and ros2/rclcpp#1064 are merged.
// This tool should not be used for until then.
static void processErrorLifecycleNode(
const std::string & node_name,
const std::chrono::seconds service_call_timeout,
const int retries)
{
LifecycleServiceClient sc(node_name);
uint8_t transition;
// Each state of the lifecycle node has a specific transition
// to the ErrorProcessing state. Some states do not support transitions
// to ErrorProcessing.
switch (sc.get_state(service_call_timeout)) {
case State::TRANSITION_STATE_CONFIGURING:
transition = Transition::TRANSITION_ON_CONFIGURE_ERROR;
break;
case State::TRANSITION_STATE_CLEANINGUP:
transition = Transition::TRANSITION_ON_CLEANUP_ERROR;
break;
case State::TRANSITION_STATE_SHUTTINGDOWN:
transition = Transition::TRANSITION_ON_SHUTDOWN_ERROR;
break;
case State::TRANSITION_STATE_ACTIVATING:
transition = Transition::TRANSITION_ON_ACTIVATE_ERROR;
break;
case State::TRANSITION_STATE_DEACTIVATING:
transition = Transition::TRANSITION_ON_DEACTIVATE_ERROR;
break;
case State::TRANSITION_STATE_ERRORPROCESSING:
transition = Transition::TRANSITION_ON_ERROR_ERROR;
break;
default:
throw std::runtime_error("Invalid state.");
}

// Despite waiting for the service to be available and using reliable transport
// service calls still frequently hang. To get reliable reset it's necessary
// to timeout the service call and retry it when that happens.
RETRY(
sc.change_state(transition, service_call_timeout),
retries);
}

void process_error_lifecycle_nodes(
const std::vector<std::string> & node_names,
const std::chrono::seconds service_call_timeout,
const int retries)
{
for (const auto & node_name : node_names) {
processErrorLifecycleNode(node_name, service_call_timeout, retries);
}
}

} // namespace nav2_util
20 changes: 20 additions & 0 deletions nav2_util/test/test_lifecycle_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

using nav2_util::startup_lifecycle_nodes;
using nav2_util::reset_lifecycle_nodes;
using nav2_util::process_error_lifecycle_nodes;

class RclCppFixture
{
Expand Down Expand Up @@ -58,3 +59,22 @@ TEST(Lifecycle, interface)
node_thread.join();
SUCCEED();
}

TEST(Lifecycle, error_transition)
{
std::vector<rclcpp_lifecycle::LifecycleNode::SharedPtr> nodes;
nodes.push_back(rclcpp_lifecycle::LifecycleNode::make_shared("foo"));
nodes.push_back(rclcpp_lifecycle::LifecycleNode::make_shared("bar"));

std::atomic<bool> done(false);
std::thread node_thread(SpinNodesUntilDone, nodes, &done);
startup_lifecycle_nodes("/foo:/bar");

// At the moment, the lifecycle transitions to ErrorProcessing are not exposed
// to client nodes, so process_error_lifecycle_nodes will always return an exception.
EXPECT_THROW(process_error_lifecycle_nodes("/foo:/bar"), std::runtime_error);

done = true;
node_thread.join();
SUCCEED();
}