Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timers being executed more than once by the multithread executor #1374

Closed
ivanpauno opened this issue Sep 23, 2020 · 4 comments
Closed

Timers being executed more than once by the multithread executor #1374

ivanpauno opened this issue Sep 23, 2020 · 4 comments
Assignees
Labels
bug Something isn't working more-information-needed Further information is required

Comments

@ivanpauno
Copy link
Member

Bug report

Required Info:

  • Operating System:
    • Any
  • Installation type:
    • Any
  • Version or commit hash:
    • https://github.com/ros2/rclcpp/commit/2531787550084307a5c8144a68c0195fcdd6442a or previous commits (i.e. current master)
  • DDS implementation:
    • Any
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

The test here sometimes fails.

e.g.: https://ci.ros2.org/view/nightly/job/nightly_osx_release/1712/testReport/projectroot/test/test_multi_threaded_executor/

Additional information

From comment here, a timer can be triggered twice in a row when using the MultiThreadedExecutor:

  • Threads A adds the timer to the wait set.
  • The timer is ready to be executed by thread A.
  • Thread B adds the timer to the wait set, and it's waiting it to be ready.
  • Thread B sees that the timer is ready to be executed (get_next_executable).
  • Thread A executes the timer and removes it from the scheduled timer collection.
  • Thread B checks if the timer has been scheduled ... but thread A has already removed it from the collection. So thread B executes the timer again.

Related PRs:

@fujitatomoya
Copy link
Collaborator

@ivanpauno

I don't catch up with you the following sequence, could you enlighten me a bit?

Thread B sees that the timer is ready to be executed (get_next_executable).
Thread A executes the timer and removes it from the scheduled timer collection.
Thread B checks if the timer has been scheduled ... but thread A has already removed it from the collection. So thread B executes the timer again.

with above sequence how Thread A can be on running to access scheduled_timers_ in the middle of Thread B? Thread B is protected with wait_mutex_

{
std::lock_guard<std::mutex> wait_lock(wait_mutex_);
if (!rclcpp::ok(this->context_) || !spinning.load()) {
return;
}
if (!get_next_executable(any_exec, next_exec_timeout_)) {
continue;
}
if (any_exec.timer) {
// Guard against multiple threads getting the same timer.
if (scheduled_timers_.count(any_exec.timer) != 0) {
// Make sure that any_exec's callback group is reset before
// the lock is released.
if (any_exec.callback_group) {
any_exec.callback_group->can_be_taken_from().store(true);
}
continue;
}
scheduled_timers_.insert(any_exec.timer);
}
}
. so is scheduled_timers_
std::lock_guard<std::mutex> wait_lock(wait_mutex_);
auto it = scheduled_timers_.find(any_exec.timer);
if (it != scheduled_timers_.end()) {
scheduled_timers_.erase(it);
.

I think that it should be the following sequence,

Thread B sees that the timer is ready to be executed (get_next_executable).
Thread A executes the timer (but not removes it from the scheduled timer collection.)
Thread B checks if the timer has been scheduled, and see it is scheduled the continue.
Thread A removes it from the scheduled timer collection.

@ivanpauno
Copy link
Member Author

with above sequence how Thread A can be on running to access scheduled_timers_ in the middle of Thread B? Thread B is protected with wait_mutex_

Yeah, I was writing the temporal sequence in the code and I think you're right.

I think that it should be the following sequence,

The sequence you comment can make the timer miss a trigger, but cannot produce double execution AFAIS.


I have to double check my reasoning here.

@ivanpauno ivanpauno self-assigned this Sep 29, 2020
@ivanpauno
Copy link
Member Author

Wrong repository 😕, moving to rclcpp.

@ivanpauno ivanpauno transferred this issue from ros2/rmw_fastrtps Oct 5, 2020
@ivanpauno ivanpauno added more-information-needed Further information is required bug Something isn't working labels Oct 5, 2020
@ivanpauno
Copy link
Member Author

Closing as the description of the issue is not accurate, #1487 seems to be the real issue (?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working more-information-needed Further information is required
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants