Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logic that moves goal handles when one expires #360

Merged
merged 1 commit into from
Dec 18, 2018

Conversation

sloretz
Copy link
Contributor

@sloretz sloretz commented Dec 14, 2018

This fixes a bug in the action server that can result in a crash when a goal is expired. The logic to move goals backwards when one expires can leave invalid pointers and lose track of valid goal handles.

To reproduce the crash

  1. Shorten the expire time to make the crash happen sooner

    diff --git a/rcl_action/src/rcl_action/action_server.c b/rcl_action/src/rcl_action/action_server.c
    index ac8b927..916e2ca 100644
    --- a/rcl_action/src/rcl_action/action_server.c
    +++ b/rcl_action/src/rcl_action/action_server.c
    @@ -286,7 +286,7 @@ rcl_action_server_get_default_options(void)
       default_options.feedback_topic_qos = rmw_qos_profile_default;
       default_options.status_topic_qos = rcl_action_qos_profile_status_default;
       default_options.allocator = rcl_get_default_allocator();
    -  default_options.result_timeout.nanoseconds = RCUTILS_S_TO_NS(15 * 60);  // 15 minutes
    +  default_options.result_timeout.nanoseconds = RCUTILS_S_TO_NS(15);  //yyy * 60);  // 15 minutes
       return default_options;
     }
     
    
  2. Build

  3. Start an action server

    ros2 run examples_rclcpp_minimal_action_server action_server_not_composable
    
  4. Run an action client a bunch of times (hapens within 10 times usually with 15 second expiration timeout)

    for i in {1..100} ; do echo $i ; ros2 run examples_rclcpp_minimal_action_client action_client_not_composable_with_cancel ; done
    

When the goal time expires, sometimes the broken logic will leave an invalid pointer that causes an exception later when something tries to access it.

terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
  what():  goal handle implementation is invalid, at /home/developer/workspaces/ros2-dev1/src/ros2/rcl/rcl_action/src/rcl_action/goal_handle.c:157

@sloretz sloretz added bug Something isn't working in review Waiting for review (Kanban column) labels Dec 14, 2018
@sloretz sloretz self-assigned this Dec 14, 2018
Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, after staring at it for a long time :)

Copy link
Contributor

@hidmic hidmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too!

--num_goal_handles;
action_server->impl->goal_handles[i] = action_server->impl->goal_handles[num_goal_handles];
allocator.deallocate(action_server->impl->goal_handles[num_goal_handles], allocator.state);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the logic also be fixed if this deallocate was for index i and moved one line up?
For sure the --i; is needed.

allocator.deallocate(action_server->impl->goal_handles[i], allocator.state);
action_server->impl->goal_handles[i] = action_server->impl->goal_handles[num_goal_handles];
action_server->impl->goal_handles[num_goal_handles] = NULL;
// decrement i to check the same index again now that it has a new goal handle
--i;

Just want to make sure I understand the bug :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be an improvement, though it would still have a bug. All goal handles need to be moved to shrink the array, but this shrinking logic is doing it one at a time in a block that only executes if the goal handle at i has expired. An active goal at the end of the array would be lost when the array shrinks because this bit wouldn't be reached. If there were two goals that expired at once then goal handles after the second expired one would only be moved back one place when they need to be moved back two places.

This PR adds a for-loop to make sure all goals after the expired one are moved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me, thanks for the explanation!

@sloretz
Copy link
Contributor Author

sloretz commented Dec 18, 2018

CI (testing packages above rcl_action)

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@sloretz sloretz merged commit 6b6c0fe into master Dec 18, 2018
@sloretz sloretz deleted the sloretz/expire_timer_move_goal_handles branch December 18, 2018 23:16
@sloretz sloretz removed the in review Waiting for review (Kanban column) label Dec 18, 2018
emersonknapp pushed a commit to emersonknapp/rcl that referenced this pull request Feb 13, 2019
* update docs about possibility of rcl_take no taking (ros2#356)

* update rcl_wait doc with respect to subs and possibility of failing takes

* add a note about possible failing takes in rcl_take docs

* 0.6.2

* Set rmw_wait timeout using ros timers too (ros2#357)

* 0.6.3

* Avoid timer period being set to 0 (ros2#359)

* Fix logic that moves goal handles when one expires (ros2#360)

* Fix error from uncrustify v0.68 (ros2#364)

* Ensure that context instance id storage is aligned correctly (ros2#365)

* Ensure that context instance id storage is aligned correctly

* Make alignment compatible with MSVC

* Namespace alignment macro with RCL_

* [rcl] Guard against bad allocation calling rcl_arguments_copy() (ros2#367)

* [rcl] Add test for copying arguments struct with no arguments

* Override allocate function in test to reveal bug

* [rcl] Only allocate arrays if there are things to copy in rcl_argument_copy()

Also guard against freeing invalid pointers if rcl_argument_copy() fails.

* Remove uncessary guard against NULL pointer

* linter, styles, uncrustify fixes
emersonknapp pushed a commit to emersonknapp/rcl that referenced this pull request Feb 13, 2019
* update docs about possibility of rcl_take no taking (ros2#356)

* update rcl_wait doc with respect to subs and possibility of failing takes

* add a note about possible failing takes in rcl_take docs

* 0.6.2

* Set rmw_wait timeout using ros timers too (ros2#357)

* 0.6.3

* Avoid timer period being set to 0 (ros2#359)

* Fix logic that moves goal handles when one expires (ros2#360)

* Fix error from uncrustify v0.68 (ros2#364)

* Ensure that context instance id storage is aligned correctly (ros2#365)

* Ensure that context instance id storage is aligned correctly

* Make alignment compatible with MSVC

* Namespace alignment macro with RCL_

* [rcl] Guard against bad allocation calling rcl_arguments_copy() (ros2#367)

* [rcl] Add test for copying arguments struct with no arguments

* Override allocate function in test to reveal bug

* [rcl] Only allocate arrays if there are things to copy in rcl_argument_copy()

Also guard against freeing invalid pointers if rcl_argument_copy() fails.

* Remove uncessary guard against NULL pointer

* linter, styles, uncrustify fixes
jacobperron pushed a commit that referenced this pull request Feb 21, 2019
* Changing security directory lookup to a prefix match rather than exact match.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Changing security directory lookup to a prefix match rather than exact match.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Changing security directory lookup to a prefix match rather than exact match.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Changing security directory lookup to a prefix match rather than exact match.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Adding security_directory module and moving rcl_get_secure_root function to it. Adding tests.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Adding security_directory module and moving rcl_get_secure_root function to it. Adding tests.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Adding security_directory module and moving rcl_get_secure_root function to it. Adding tests.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Adding security_directory module and moving rcl_get_secure_root function to it. Adding tests.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Adding security_directory module and moving rcl_get_secure_root function to it. Adding tests.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Changing security directory prefix matching to be optional. Improving error messages around security directory lookup.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Fixing get_best_matching_directory so that it always fetches the next file inside the loop.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* make pr ready for ros2cli security feature (#1)

* update docs about possibility of rcl_take no taking (#356)

* update rcl_wait doc with respect to subs and possibility of failing takes

* add a note about possible failing takes in rcl_take docs

* 0.6.2

* Set rmw_wait timeout using ros timers too (#357)

* 0.6.3

* Avoid timer period being set to 0 (#359)

* Fix logic that moves goal handles when one expires (#360)

* Fix error from uncrustify v0.68 (#364)

* Ensure that context instance id storage is aligned correctly (#365)

* Ensure that context instance id storage is aligned correctly

* Make alignment compatible with MSVC

* Namespace alignment macro with RCL_

* [rcl] Guard against bad allocation calling rcl_arguments_copy() (#367)

* [rcl] Add test for copying arguments struct with no arguments

* Override allocate function in test to reveal bug

* [rcl] Only allocate arrays if there are things to copy in rcl_argument_copy()

Also guard against freeing invalid pointers if rcl_argument_copy() fails.

* Remove uncessary guard against NULL pointer

* linter, styles, uncrustify fixes

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Update rcl/include/rcl/security_directory.h

Co-Authored-By: AAlon <avishayalon@gmail.com>
Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Adding line break in docstring

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Removing duplicate doc string

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Removing tinydir from the source tree, instead using the ROS package tinydir_vendor.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Removing tinydir

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Reformatting license notice as per linter template.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Update test_security_directory.cpp

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Changing input to putenv to be a global, statically allocated buffer.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* test_security_directory - Using a larger buffer for env string manipulations.

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Copy environment variable to allocated string so it is not clobbered by next lookup

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Address review comments

fix security directory exact match comment and unset env vars before tests

Signed-off-by: Emerson Knapp <eknapp@amazon.com>

* Remove strncpy

Signed-off-by: Emerson Knapp <eknapp@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants