-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in hardware interface when reading robot_description from topic #1442
Comments
Hello! Earlier to these logs do you see any info about resource successfully initialized the components from the URDF from the topic before the crash?. Can you try using mock components and check if this is the case? Thank you |
Hi,
I get one of these behaviors at each startup. Adding /removing additional log statements changes which behavior I get most of the time. (Passing the robot_description as a parameter always results in a successful start). Using mocked hardware instead this issue does not exist. I have the same hardware interface loaded 4 times btw. The error occurs in a differnt one each time. |
Ok, based on what you said, can you add the following condition |
I guess I have to compile the resource manager myself then ? I will try that. |
Could you please point me out where I have to add this exactly ? I am not too familiar with the ros2 control code base |
Hello @firesurfer ! You can tryout this branch: https://github.com/saikishor/ros2_control/tree/robot_description/crash/iron and let us know. |
@saikishor Thanks for your help. I probably won't be able to try it out this week but I will definitely find some time next week. |
Sure. No worries |
@saikishor Just tried your patch. |
@firesurfer thanks for confirming it. Looking at the code last night, I think I might know the root cause of the issue. In 1 hour, I can send you another branch to test, maybe it might solve the issue |
Hello @firesurfer, Could you test this branch: https://github.com/saikishor/ros2_control/tree/fix/iron/missing/resources_lock and let me know if this fixes your issue. Ideally, it should as the issue might be coming from some mutexes |
@saikishor Just tried your branch and it introduces a new segfault in the read method:
Starting it again and it worked. So the race condition / threading issue still seems to be there EDIT: Looks a bit like it works roughly every second time ;) |
Lol, so now it works once in every 2 runs. Right before this segmentation fault, do you have more logs on activation of the hardware components? |
Soo this time I had in 8 startups: 3x 1x 1x successful 2x Segfault -> All hardware interface activated successfully 1x Looks like I was just lucky when I said it works like every 2 runs.... |
@firesurfer I've pushed another similar commit to the same branch, At least with this the read and write cycles shouldn't run unless the hardware components are loaded and initialized, I've added some logging as well for easier reference. Can you please pull the change and check? |
@saikishor I pulled this commit: d98c0ee 3x
1x Segfault after everything was successfully activated 1x Interestingly it tells me something about: 1x Segfault but with complaining about empty activate / deactivate list (we have node in our system which tries to disable all our motion controllers under certain conditions ) [ros2_control_node-12] [INFO] [1710763734.794714775] [controller_manager]: Empty activate and deactivate list, not requesting switch |
then the problem might be coming from here ros2_control/controller_manager/src/controller_manager.cpp Lines 339 to 357 in 20e01d8
This is executed right after the hardware components are loaded from the URDF method in the |
@firesurfer I've pushed another commit to the same branch. Can you test it please? |
Good news @saikishor . I did like 10 starts and all of them were successful :) |
Awesome. I'm glad that the changes worked. I would then try to open a PR to have the fix upstream |
@firesurfer If you have time, please review the changes in the above linked PR. Thank you |
Describe the bug
I have a hardware interface for some piece of internal hardware. When I have the
controller_manager
obtain therobot_description
from therobot_state_publisher
instead of passing it as a parameter I get a segfault in theread
method of the hardware interface. This behavior does not occur when starting it with gdb attached!To Reproduce
Start the
ros2_control_node
configured to read therobot_description
from the corresponding topic.Expected behavior
Should not crash ;)
Environment (please complete the following information):
hardware_interface
version: 3.24.0Additional context
I already call the
read
method once in theactivate
method in order to obtain the current positions of the hardware. This call succeeds. But in the read method there is quite a bit of logic as I need to workaround some quirks of our hardware.Stacktrace (I replaced the names):
Starting the node with GDB attached delays startup a tiny bit. So it might be somehow timing related.
In the hardware interface I access the URDF via
system_info.original_xml
and read some parameters such as joint limits from it.The text was updated successfully, but these errors were encountered: