Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor SwitchParams to fix the incosistencies in the spawner tests #1638

Merged
merged 6 commits into from
Aug 9, 2024

Conversation

saikishor
Copy link
Member

@saikishor saikishor commented Jul 23, 2024

This PR aims to fix the inconsistencies that are observed with the spawner_unspawner tests and could be reproducible at higher frequencies as well. It seems to be that due to the wall_timer etc the manager_switch was trigger twice and causing the crashes, and upon checking there are missing guards and mutexes to protect this context.

The changes are also tested on the real hardware! :)

The following failures are caused with the current setup and is solved with the changes proposed in this PR:

1: [WARN] [1722085254.107041328] [spawner_ctrl_1]: Controller already loaded, skipping load_controller
1: [INFO] [1722085254.107959063] [test_controller_manager]: Configuring controller 'ctrl_1'
1: [WARN] [1722085254.125499381] [rcl_lifecycle]: No transition matching 3 found for current state activating
1: [ERROR] [1722085254.125529455] [ctrl_1]: Unable to start transition 3 from current state activating: Transition is not registered., at ./src/rcl_lifecycle.c:355
1: [ERROR] [1722085254.125577475] [test_controller_manager]: After activation, controller 'ctrl_1' is in state 'activating' (13), expected 'active' (3).
1: [INFO] [1722085254.136365765] [spawner_ctrl_1]: Configured and activated ctrl_1
1: [INFO] [1722085254.308361830] [test_controller_manager]: Loading controller 'ctrl_1'
1: [INFO] [1721556773.163766473] [test_controller_manager]: Configuring controller 'ctrl_3'
1: terminate called after throwing an instance of 'std::runtime_error'
1:   what():  Can not get command interface configuration until the controller is configured. The current state is : activating and ID : 13 and ACTIVE is : 3 and INACTIVE IS : 2
1: Stack trace (most recent call last) in thread 112351:
1: [INFO] [1721560930.460608458] [test_controller_manager]: Configuring controller 'ctrl_3'
1: [WARN] [1721560930.479481009] [rcl_lifecycle]: No transition matching 3 found for current state activating
1: [ERROR] [1721560930.479493504] [ctrl_1]: Unable to start transition 3 from current state activating: Transition is not registered., at ./src/rcl_lifecycle.c:355
1: [ERROR] [1721560930.479538729] [test_controller_manager]: After activation, controller 'ctrl_1' is in state 'activating' (13), expected 'active' (3).
1: terminate called after throwing an instance of 'std::runtime_error'
1:   what():  Can not get command interface configuration until the controller is configured. The current state is : activating and ID : 13 and ACTIVE is : 3 and INACTIVE IS : 2
1: Stack trace (most recent call last) in thread 227964:
1: #21   Object "", at 0xffffffffffffffff, in 
1: [INFO] [1721560930.490672863] [spawner_ctrl_1]: Configured and activated all the parsed controllers list!
1: #20   Source "../sysdeps/unix/sysv/linux/x86_64/clone3.S", line 81, in __clone3 [0x7fe73cdf584f]
1: [INFO] [1721716958.916360995] [test_controller_manager]: Loading controller 'ctrl_1'
1: [INFO] [1721716958.935490861] [spawner_ctrl_1]: Loaded ctrl_1
1: [INFO] [1721716958.936473516] [test_controller_manager]: Configuring controller 'ctrl_1'
1: [INFO] [1721716958.943142165] [spawner_ctrl_1]: Configured ctrl_1 successfully
1: [INFO] [1721716958.943535316] [spawner_ctrl_1]: Activating ctrl_1
1: [INFO] [1721716958.944450162] [test_controller_manager]: Found controller 'ctrl_1' that needs to be activateed in list of controllers. Adding to the list!
1: [INFO] [1721716958.952596909] [test_controller_manager]: Entering manager switch and before activate controllers. 3720000000
1: [INFO] [1721716958.952633229] [test_controller_manager]: Entering manager switch and before activate controllers. 3720000000
1: [INFO] [1721716958.952635535] [test_controller_manager]: Activating controllers: ctrl_1, 
1: [INFO] [1721716958.952672064] [test_controller_manager]: Activating controllers: ctrl_1, 
1: [INFO] [1721716958.952687694] [test_controller_manager]: Activating controller 'ctrl_1' with current state : inactive
1: [INFO] [1721716958.952703837] [test_controller_manager]: Activating controller 'ctrl_1' with current state : inactive
1: [INFO] [1721716958.952738683] [test_controller_manager]: Switching done : 0 at 3720000000
1: [ERROR] [1721716958.952773682] [test_controller_manager]: Could not extract state interfaces for controller 'ctrl_1': Can not get state interface configuration until the controller is configured. The current state is : activating and ID : 13 and ACTIVE is : 3 and INACTIVE IS : 2
1: terminate called after throwing an instance of 'std::runtime_error'
1:   what():  Can not get state interface configuration until the controller is configured. The current state is : activating and ID : 13 and ACTIVE is : 3 and INACTIVE IS : 2
1: [INFO] [1721716958.952890889] [test_controller_manager]: Successfully switched controllers. Activated: ctrl_1, 
1: Stack trace (most recent call last) in thread 60496:
1: #20   Object "", at 0xffffffffffffffff, in 
1: [INFO] [1721716958.963346793] [spawner_ctrl_1]: Configured and activated ctrl_1
1: [INFO] [1721716958.964605158] [test_controller_manager]: Loading controller 'ctrl_3'
1: [INFO] [1721716958.973829860] [spawner_ctrl_1]: Loaded ctrl_3

Fixes: #1637
Fixes: #1368
Fixes: #1647
Fixes: #1657
Fixes: #1644
Fixes: ros-controls/ros2_control_ci#105
Fixes: ros-controls/ros2_control_ci#106
Fixes: ros-controls/ros2_control_ci#107
Fixes: ros-controls/ros2_control_ci#108

Copy link

codecov bot commented Jul 23, 2024

Codecov Report

Attention: Patch coverage is 84.61538% with 4 lines in your changes missing coverage. Please review.

Project coverage is 88.04%. Comparing base (5cd1a43) to head (1998266).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1638      +/-   ##
==========================================
+ Coverage   88.03%   88.04%   +0.01%     
==========================================
  Files         108      108              
  Lines       10006    10024      +18     
  Branches      891      892       +1     
==========================================
+ Hits         8809     8826      +17     
- Misses        877      878       +1     
  Partials      320      320              
Flag Coverage Δ
unittests 88.04% <84.61%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../include/controller_manager/controller_manager.hpp 100.00% <100.00%> (ø)
...ller_manager/test/test_controller_manager_srvs.cpp 99.25% <100.00%> (+<0.01%) ⬆️
controller_manager/src/controller_manager.cpp 77.29% <73.33%> (+0.07%) ⬆️

Copy link
Contributor

@christophfroehlich christophfroehlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing these issues!
Changes look fine, but I'm not sufficiently proficient in reviewing the multi-thread stuff.

The CI failures happen back to humble, can we backport this PR?
ABI breaks might not be that critical because this is an internal class only?

This PR reduces coverage because of the untested fallback branches. Could we add any tests there?

@saikishor
Copy link
Member Author

The CI failures happen back to humble, can we backport this PR?
ABI breaks might not be that critical because this is an internal class only?

I believe yes! This can be backported without any issues as you said it is only an ABI change

@christophfroehlich christophfroehlich added backport-humble This label should be used by maintaines only! Label triggers PR backport to ROS2 humble. backport-iron This label should be used by maintaines only! Label triggers PR backport to ROS2 Iron. labels Jul 29, 2024
@christophfroehlich christophfroehlich linked an issue Jul 30, 2024 that may be closed by this pull request
@christophfroehlich christophfroehlich linked an issue Aug 1, 2024 that may be closed by this pull request
Copy link
Member

@bmagyar bmagyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pretty elegant solution, I'm only questioning the reset() function part

controller_manager/src/controller_manager.cpp Show resolved Hide resolved
Copy link
Member

@destogl destogl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this very good. Just to clarify, I am not sure how this solves multiple calls from the spawner and unspawner. Should't they be already blocked at the service callback level?

@saikishor
Copy link
Member Author

saikishor commented Aug 8, 2024

I find this very good. Just to clarify, I am not sure how this solves multiple calls from the spawner and unspawner. Should't they be already blocked at the service callback level?

Hello @destogl!

The problem is not in the spawner per se, the problem in the wall timers that we use inside the tests and other places. The wall timers are sometimes triggering the same callback twice, and hence the manage_switch is being triggered twice and then most of the issues occur due to this coupled triggers. This means the activate/deactivate are being called simultaneously and thereby causing problems. It took me like a week of debugging to understand the root cause XD. Ideally, it shouldn't happen in real usecase.

@bmagyar bmagyar merged commit a1ad523 into ros-controls:master Aug 9, 2024
16 of 19 checks passed
mergify bot pushed a commit that referenced this pull request Aug 9, 2024
…1638)

(cherry picked from commit a1ad523)

# Conflicts:
#	controller_manager/src/controller_manager.cpp
mergify bot pushed a commit that referenced this pull request Aug 9, 2024
…1638)

(cherry picked from commit a1ad523)

# Conflicts:
#	controller_manager/src/controller_manager.cpp
christophfroehlich added a commit that referenced this pull request Aug 12, 2024
* [ResourceManager] Make destructor virtual for use in derived classes (#1607)

* Fix typos in test_resource_manager.cpp (#1609)

* [CM] Remove support for the description parameter and use only `robot_description` topic (#1358)

---------

Co-authored-by: Felix Exner <exner@fzi.de>
Co-authored-by: Christoph Froehlich <christoph.froehlich@ait.ac.at>

* move critical variables to the private context (#1623)

* Fix controller starting with later load of robot description test (#1624)

* Fix the duplicated restart of the controller_manager services initialization

* Scope the ControllerManagerRunner to avoid malloc and other test issues

* reorder the tests for consistent log at the end

* Remove noqa (#1626)

* Unused header cleanup (#1627)

* Create debugging.rst (#1625)


---------

Co-authored-by: Sai Kishor Kothakota <saisastra3@gmail.com>
Co-authored-by: Christoph Froehlich <christoph.froehlich@ait.ac.at>

* Update changelogs

* 4.14.0

* add missing rclcpp logging include for Humble compatibility build (#1635)

* [CM] Remove deprecated spawner args (#1639)

* Add a pytest launch file to test ros2_control_node (#1636)

* Fix rst markup (#1642)

* Fix rqt_cm paragraph

* Fix indent

* CM: Add missing includes (#1641)

* [RM] Add `get_hardware_info` method to the Hardware Components (#1643)

* Fix the namespaced controller_manager spawner + added tests (#1640)

* Bump version of pre-commit hooks (#1649)

Co-authored-by: christophfroehlich <3367244+christophfroehlich@users.noreply.github.com>

* Add missing include for executors (#1653)

* Update changelogs

* 4.15.0

* refactor SwitchParams to fix the incosistencies in the spawner tests (#1638)

* Modify test with missing CM to have a timeout

* Catch exception when CM services are not found

And print the error and exit in the application

* Exit with code 1 on unreached CM

---------

Co-authored-by: Sai Kishor Kothakota <sai.kishor@pal-robotics.com>
Co-authored-by: Parker Drouillard <parker@pepcorp.ca>
Co-authored-by: Dr. Denis <denis@stoglrobotics.de>
Co-authored-by: Christoph Froehlich <christoph.froehlich@ait.ac.at>
Co-authored-by: Christoph Fröhlich <christophfroehlich@users.noreply.github.com>
Co-authored-by: Henry Moore <henry.moore@picknik.ai>
Co-authored-by: Lennart Nachtigall <firesurfer127@gmail.com>
Co-authored-by: Sai Kishor Kothakota <saisastra3@gmail.com>
Co-authored-by: Bence Magyar <bence.magyar.robotics@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: christophfroehlich <3367244+christophfroehlich@users.noreply.github.com>
bmagyar pushed a commit that referenced this pull request Aug 14, 2024
bmagyar pushed a commit that referenced this pull request Aug 14, 2024
@saikishor saikishor deleted the fix/spawner/tests branch August 17, 2024 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-humble This label should be used by maintaines only! Label triggers PR backport to ROS2 humble. backport-iron This label should be used by maintaines only! Label triggers PR backport to ROS2 Iron.
Projects
None yet
4 participants