Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interrupt_guard_condition_ causes executor "spurious" awakes #1021

Open
alsora opened this issue Mar 10, 2020 · 4 comments
Open

interrupt_guard_condition_ causes executor "spurious" awakes #1021

alsora opened this issue Mar 10, 2020 · 4 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@alsora
Copy link
Collaborator

alsora commented Mar 10, 2020

Bug report

Required Info:

  • Operating System:
    • Ubuntu 18.04
  • Installation type:
    • source

Steps to reproduce issue

Run a ROS 2 system made of a single node with only a timer. No publishers/subscriptions or anything else.

#include <chrono>
#include <memory>
#include "rclcpp/rclcpp.hpp"

using namespace std::chrono_literals;

class MinimalTimer : public rclcpp::Node
{
public:
  MinimalTimer()
  : Node("minimal_timer")
  {
    timer_ = this->create_wall_timer(
      5s, std::bind(&MinimalTimer::timer_callback, this));
  }

private:
  void timer_callback()
  {
    RCLCPP_WARN(this->get_logger(), "Timer executed!");
  }
  rclcpp::TimerBase::SharedPtr timer_;
};

int main(int argc, char * argv[])
{
  rclcpp::init(argc, argv);
  rclcpp::spin(std::make_shared<MinimalTimer>());
  rclcpp::shutdown();
  return 0;
}

Expected behavior

The timer has a 5 seconds period, so the executor should wake up once every 5 seconds.

Actual behavior

After every real awakes, the executor wakes up a second time for no apparent reason.

  • Executor waits for 5 seconds
  • Executor wakes up to execute the timer callback
  • Executor goes back to wait
  • Executor wakes up IMMEDIATELY
  • There is no work to do, so executor goes back to wait
  • Executor waits for 5 seconds

This pattern keeps going on forever.
This applies also to more complex systems, with the same frequency.

Some logs that I added:

[INFO] [1583844157.135606428] [single_thread_exec_cpp]: LINE:34 Spin: While loop iteration
[INFO] [1583844157.135620489] [executor_cpp]: LINE:598 get_next_ready_executable 1st call: Nothing 
[INFO] [1583844162.135118217] [executor_cpp]: LINE:602 wait_for_work completed
[INFO] [1583844162.135281008] [executor_cpp]: LINE:540 found ready timer
[INFO] [1583844162.135315670] [executor_cpp]: LINE:607 get_next_ready_executable 2nd call: Success 
[WARN] [1583844162.135348415] [minimal_timer]: Timer executed!
[INFO] [1583844162.135504465] [single_thread_exec_cpp]: LINE:34 Spin: While loop iteration
[INFO] [1583844162.135538023] [executor_cpp]: LINE:598 get_next_ready_executable 1st call: Nothing 
[INFO] [1583844162.135712743] [executor_cpp]: LINE:602 wait_for_work completed
[INFO] [1583844162.135738309] [executor_cpp]: LINE:607 get_next_ready_executable 2nd call: Nothing 
[INFO] [1583844162.135759814] [single_thread_exec_cpp]: LINE:34 Spin: While loop iteration
[INFO] [1583844162.135781694] [executor_cpp]: LINE:598 get_next_ready_executable 1st call: Nothing 
[INFO] [1583844167.134761867] [executor_cpp]: LINE:602 wait_for_work completed
[INFO] [1583844167.134906444] [executor_cpp]: LINE:540 found ready timer
[INFO] [1583844167.134943200] [executor_cpp]: LINE:607 get_next_ready_executable 2nd call: Success 
[WARN] [1583844167.134976433] [minimal_timer]: Timer executed!

Additional information

I saw that the wait implementation in the DDS library is effectively called every time, so it's not the rcl layer or something that does not even go to sleep.

The culprit looks to be

memory_strategy_->add_guard_condition(&interrupt_guard_condition_);

If I remove this line, I see the correct number of awakes

See source code here

@peterpena
Copy link
Contributor

This is expected. Here is why:

The spurious wake is actually caused in:

   if (rcl_trigger_guard_condition(&interrupt_guard_condition_) != RCL_RET_OK) {
     throw std::runtime_error(rcl_get_error_string().str);
   }

in Executor::execute_any_executable.

This interrupt exist to avoid a thread, in the case the executor is multithreaded and the threads are not mutually exclusive, waiting indefinitely in rcl_wait in wait.c in rcl. We can use the example you have given and make it a MultithreadedExecutor. When thread 1 goes into wait_for_work and adds the timer handle to wait_set and is woken by the timer, it starts executing the timer. At the same time, the lock is released and the other thread waiting, thread 2, goes into wait_for_work. Since thread 1 is executing the timer callback, the memory strategy does not add the timer handle to wait_set and thread 2 goes into rcl_wait with a wait_set that has no timer handle, and when thread 1 finishes executing the timer callback it is waiting for thread 2 to unlock; effectively, having thread 1 and 2 waiting indefinitely. With the interrupt_guard_condition triggered, any thread waiting in rcl_wait will wake up and look for work to be executed. You can run the code given to see this in effect:

#include <chrono>
#include <condition_variable>
#include <memory>
#include <mutex>
#include <string>
#include <thread>


#include "rclcpp/rclcpp.hpp"

using namespace std::chrono_literals;

class MultiTimer final : public rclcpp::Node
{
public:
  MultiTimer() : Node("multi_timer")
  {
    timer_ = this->create_wall_timer(
      2s, std::bind(&MultiTimer::timer_callback, this));
  }

private:
  void timer_callback()
  {
    RCLCPP_WARN(this->get_logger(), "Timer executed!");
  }

  rclcpp::TimerBase::SharedPtr timer_;
};

int main(int argc, char * argv[])
{
  rclcpp::init(argc, argv);
  rclcpp::executors::MultiThreadedExecutor executor;
  auto multi_timer = std::make_shared<MultiTimer>();
  executor.add_node(multi_timer);
  executor.spin();
  rclcpp::shutdown();
  return 0;
}

@alsora
Copy link
Collaborator Author

alsora commented May 27, 2020

Thank you for the detailed explanation.

However now I have another doubt.
If that interruption is required only for multi thread scenarios, why is it affecting also the single thread executor?

Would it be possible to do a PR and to move the related code from the base class executor.cpp to the multi thread executor ?

@wjwwood wjwwood assigned peterpena and unassigned wjwwood May 27, 2020
@peterpena peterpena added the help wanted Extra attention is needed label May 28, 2020
@peterpena
Copy link
Contributor

We can definitely do that so it can avoid triggering the interrupt guard condition for the single threaded executor. Would you be willing to have a PR for this?

@alsora
Copy link
Collaborator Author

alsora commented May 29, 2020

Unfortunately I can't right now.
If by middle of June no one has picked it up yet, I would be more than happy to do it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants