-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Robustify spawner #1501
Robustify spawner #1501
Conversation
With the current implementation I expect this to fail most of the times.
This waiting mechanism seems to be very error-prone and is strictly not needed since we wait for all the services, anyway. NOTE: This change removes the ability to specify the controller_manager_timeout.
by setting it as the timeout for the first service call.
Apparently, sercice calls can end up in a deadlock where the server says it cannot send the response to the client. In that case, if we spin_until_future_complete() we will be in a deadlock forever. Hence, this commit adds a timeout and retry mechanism to the service call abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look fine to me.
- Couldn't you hardcode the URDF directly in the launch_testing file instead of running xacro? Then you wouldn't need to install the URDF. Otherwise, you could use a relative path from the launch_testing file to the urdf instead of FindPackageShare.
- Same for the yaml (haven't done that yet to be honest).
- instead of waiting 30s, would it make sense to check repeatedly in a loop if already all controllers are running? (+ some delay at every loop, +max loops of course)
Edit: We could just put URDF+yaml into the test_assets package and leave the launch file as it is.
I'll have to have a look at the tests anyway, since they seem not to work on noble... I'll probably wait to properly address this until Friday when noble is released. Thank you for the input @christophfroehlich. Moving both files to the test_assets didn't seem right to me, since the controller config file is a bit awkward (I mean, I spawn 50 joint_state_controllers for the same joints). I tried including things into the launchfile, but since the parameters aren't for the It looks like #1483 might get merged before this, though, in which case most of the changes here aren't necessarily required anymore and maybe also the test is questionable. We might hold this until #1483 is merged and it is decided how to proceed with humble and iron. |
controller_manager/controller_manager/controller_manager_services.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this arbitrary repetition time a bit "mew", if we provide this properly than it would make more sense.
controller_manager/controller_manager/controller_manager_services.py
Outdated
Show resolved
Hide resolved
I'll update this, now that #1562 is merged. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1501 +/- ##
==========================================
- Coverage 86.67% 86.64% -0.03%
==========================================
Files 115 115
Lines 10528 10544 +16
Branches 967 970 +3
==========================================
+ Hits 9125 9136 +11
- Misses 1056 1059 +3
- Partials 347 349 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
|
This is not related to this PR
I'm happy with how that currently is, I'm just not sure about the docstring style. I don't have a lot of experience regarding python docstrings, but I do prefer using type hints in the signature and restructured text in the docstring: def service_caller(
node: 'Node',
service_name: str,
service_type: str,
request: SrvTypeRequest,
service_timeout: Optional[float] = 0.0,
call_timeout: Optional[float] = 10.0,
max_attempts: Optional[int] = 3,
-> SrvTypeResponse:
"""
Abstraction of a service call.
Has an optional timeout to find the service and a mechanism
to retry a call of no response is received.
:param node: Node object to be associated with
:param service_name: Service URL
:param request: The request to be sent
:param service_timeout: Timeout (in seconds) to wait until the service is available. 0 means
waiting forever, retrying every 10 seconds.
:param call_timeout: Timeout (in seconds) for getting a response
:param max_attempts: Number of attempts until a valid response is received. With some
middlewares it can happen, that the service response doesn't reach the client
leaving it in a waiting state forever.
:return: The service response
""" As there aren't too many python docstrings in this project and I couldn't find any specification on what to use, I would like to ask, whether there is a preference here. |
This is great! Now we have a reference :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, just a few minor cosmetic proposals if you like. It can go in also without it.
controller_manager/controller_manager/controller_manager_services.py
Outdated
Show resolved
Hide resolved
controller_manager/controller_manager/controller_manager_services.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Dr. Denis <denis@stoglrobotics.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if future.result() is None: | ||
node.get_logger().warning( | ||
f"Failed getting a result from calling {service_name} in " | ||
f"{service_timeout}. (Attempt {attempt+1} of {max_attempts}.)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmauch should be call_timeout
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right! Thanks for spotting this
service_timeout=0.0, | ||
call_timeout=10.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmauch one more question, do you see an easy way to make these configurable by the user? My bringup is quite CPU intensive causing the call_timeout to be reached, however it works if I increase it a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not, any harm in arbitrarily increasing it to e.g. 30?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current implementation it can happen that when using multiple controller spawners some of them fail or get stuck, see #1182. This fixes #1182 and #1483.
While writing this, I realize this has great overlap with #1483, but I see no problem combining those two.
As briefly mentioned above, this change addresses two things:
Note: The test I implemented is a bit hacky, so we might also want to remove it again? Otherwise I think we would have to change the following things:
test
folder in order to access the controller configuration file living in there. We should either install that file by hand rather than the complete test folder or move it somewhere else.ros_control_test_assets
as I thought that might be the most useful place. I might be wrong.I implemented and tested things on the rolling on jammy installation I currently have, but I know the problem definitively also arises for humble users.