-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide human-readable definitions/alarms for RCL(C) initialisation errors #128
Comments
Looking into this a bit: this wouldn't be too much work to implement, but I'm trying to figure out a good place for these kinds of "RCL(C) initialisation errors/alarms" in We don't really have a main alarm category for those I believe: Lines 71 to 82 in c9a9670
The issue title already refers to these kinds of problems as "initialisation errors", so it doesn't seem like they would fit under do we still have 'space' for a category dedicated to RCLC_INIT errors/alarms? We could then raise a regular alarm and provide some descriptive text as part of it. Edit: we could also make it a 'micro-ROS' category. |
As to subcodes: initially I thought we could perhaps opt to just 'forward' the various However, there is no guarantee those values will never change and that would mean we'd have to keep track of that and update our documentation if that happens. Two approaches I've come up with:
the former would be trivial to implement (although we need to figure out which category to use). The latter would be a little more work, but would make RCL(C) errors more 'explicit' and we could document each of them individually. I first preferred the second approach, but thinking about it, the former is less work, gets us what we're looking for (a way to document at least some of the RCL(C) initialisation errors/return codes when they could potentially be solved by the user) and wouldn't require adding 30 new subcodes and a new alarm category. We could still document individual values (ie: |
I'm in favor of the second option. I think that the purpose of raising these new alarms is to make it easier for the user the debug. In that regard, providing an explicit definition of the error (as opposed to some number that must be looked up) is much more user-friendly. |
It would mean adding quite a few new subcodes, with a lot of documentation to write as well. |
I'm going to postpone this to after |
An additional consideration: IIUC, subcodes are limited to a I realise there are only so many |
I just had another occurrence of this and wasted a good amount of time trying to remember how to track down the rcl error codes. This shouldn't be difficult to implement, so I retargeted the next milestone. |
Should this be
|
Sure. I just want to get it on the todo list. |
I am working on this, but I wanted to point out that the alarm will no longer contain information about where the problem happened that it currently does. For example, right now we have
The subcode here shows the user where the error happened (initializing the /queue_traj_point service in this case) but it does not tell the user what the problem is. The proposed solution would describe to the user what the problem is, but it would fail to describe where that problem happened. So if the user receives an alarm with the message "Invalid node name", the user may not know whether which node this applies to. There is not enough space in the alarm message (32 chars) to fully describe what went wrong. The debug broadcast could be used to give more detail, but the error message won't contain all the information that the user needs to solve it. It doesn't match the principle that @ted-miller describes here Also, should I include "RCL/RCLC" in every alarm to make it clear, or can we assume that the user will look up the error code and see that the main alarm code is specifically for RCL/RCLC? This isn't as important, but I have been including "RCL/RCLC" in each alarm message, and it would be nice to have an extra few characters to use to write the actual alarm message. |
As we discussed, you can use multiple concurrent alarms to provide additional data. I'd suggest ommiting the "RCL/RCLC", as that's half your message. We don't need users to know that the code came from that layer necessarily. They can look up the code/message in our troubleshooting guide. Then we can provide additional detail there. |
@jimmy-mcelwain: would you already have pushed your changes somewhere? RCL(C) provides some ways of retrieving a humand readable error message, so I'm curious what your approach has been so far. We would really want to avoid defining a set of (shadow) codes ourselves I believe.
I was indeed also going to suggest something like that. |
I just pushed a branch report_rcl_errors. It is not complete. I did define a set of codes shadow codes myself. The issue is that the RCL_RET error codes defined in Similarly, the alarm messages have a character limit of 32, so I have been manually creating them based on documentation rather than using any provided human-readable messages. That being said I can still print the provided string (I believe using And while I can raise multiple alarms, something will have to change. As of right now |
There is also a bit of overlap with #268 |
@jimmy-mcelwain: thanks for looking into this. Looking at main...report_rcl_errors though, I'm wondering whether this approach 'scales'. You already have 38 new subcodes and this basically sets us up to keep track of what RCL and RCLC are doing and keep our shadow set up-to-date/matching. We also need to keep I know @ted-miller wrote in #128 (comment) he prefers this approach over the alternative I described in #128 (comment), but that latter approach would have meant one or two additional subcodes and then including the RCL(C) error code in the alarm text. Something like this for instance:
A concurrently posted (fatal) alarm would provide the info on where this happened, and that alarm would have its own unique subcode to aid debugging (just as we now have). We would document the values of That would seem to be much less work (although we'd still have to update the documentation, but we'd have to do that in any case) and it wouldn't require us to keep anything up-to-date -- except the documentation. I also know @ted-miller prefers alarm texts which include a hint on how to solve the issue that caused it to be raised, but 32 chars are just not enough for that in these cases, and we already have many alarms which don't do that. In addition, for many of these RCL(C) issues, users can't do anything anyway, except reach out to us. As an example: |
That's not the ideal scenario. But you make a valid point regarding maintainability. I could live with this. |
I pushed another commit to the branch which removed the shadow codes. I created a function |
@ted-miller: we might want to consider reporting some of these RCL(C) initialisation errors, or defining some subcodes for them.
Especially the
NODE
,PUBLISHER
,SUBSCRIPTION
, etc categories seem to be caused by things which could often be solved by users without needing to post on the tracker here.Originally posted by @gavanderhoorn in #125 (reply in thread)
The text was updated successfully, but these errors were encountered: