-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle errors from cluster_status #3735
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only had time for first commit so far
Use pcmk_unpack_scheduler_input instead.
… fails. This function can return a couple error codes, most notably when called on input with a feature set that is newer than the latest supported. In that case, the caller should return its own error instead f trying to continue on with an unpopulated scheduler object. This prevents a cascade of error messages.
Also, there's no need to do any error reporting. pcmk__config_err will have already called crm_err in this case.
The error message is hidden and only gets displayed if -V is given on the command line. Adding config error/warning handlers will cause the error to be displayed regardless. This could have been implemented in a couple ways, and there's tradeoffs here. I've chosen to duplicate what's happening in crm_verify, but instead of checking for verbosity (which is a global variable in that file), I'm checking out->is_quiet. This means that if you do `crm_simulate -Q`, you won't see the error message but you will get an error return code. This also means that `crm_simulate -Q -VVVV...`, you still won't see the error message. This may be a bug, but I'm not sure who would do that and I also think these sorts of problems are pervasive in our command line tools. Fix T521
This is just like the previous patch for crm_simulate, complete with all the same problems regarding -Q and -V.
This is just like the previous patch to crm_simulate. However, one additional problem here is that it relies on using the deprecated -Q command line option. On the other hand, I think this is okay because we have a lot of work to do straightening out these sorts of options for all our command line tools. This is just one more thing we'll have to deal with at that time.
This takes care of all callers of pcmk__output_cluster_status and pcmk__status. pcmk_status would also be affected, but at the moment there are no users of that function and anyway the config error handlers aren't public API.
The point of this is to allow it to return the value from unpack_cib, which is returning the value from cluster_status. This allows us to check whether that function hit the too-new feature set CIB condition.
…tions. This takes care of most callers - the ones in the daemons are unlikely to be a problem. This allows catching the too-new schema condition in various other tools and displaying an error message to the user. Note that a couple other callers don't need to check the return value. I've added comments explaining why.
* Remove the leading function name from various messages. This was most commonly "unpack_resources". * In the XML output format, move various messages from text that gets printed out to the XML output itself. This does end up with somewhat weird output with status="0" message="OK" followed by some error messages. * Add a couple warnings to crm_resource output.
07e8301
to
b4f4b75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning to keeping the messages in crm_simulate output, they are issues that the user needs to know about
|
||
va_start(ap, msg); | ||
pcmk__assert(vasprintf(&buf, msg, ap) > 0); | ||
if (!out->is_quiet(out)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well just un-deprecate -Q
@@ -1730,6 +1757,7 @@ WARNING: Creating rsc_location constraint 'cli-ban-dummy-on-node1' with a score | |||
=#=#=#= End test: Move a resource from its existing location - OK (0) =#=#=#= | |||
* Passed: crm_resource - Move a resource from its existing location | |||
=#=#=#= Begin test: Clear out constraints generated by --move =#=#=#= | |||
warning: More than one node entry has name 'node1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this message accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like it, though I don't yet see what could be causing that.
@@ -1,3 +1,6 @@ | |||
error: Ignoring invalid node_state entry without id | |||
warning: Ignoring failure timeout (10s) for rsc_pcmk-2 because it conflicts with on-fail=block | |||
warning: Ignoring failure timeout (10s) for rsc_pcmk-4 because it conflicts with on-fail=block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the errors/warnings in the test cases are things we really should fix in the test case :(
(some may be testing the error though)
I'm a little undecided on this patch at the moment - check out the changes to regression test output in the last patch. I think some of that could be mitigated by not printing the warnings at all. The errors are a little trickier. We want to print them out if we're not verbose (that's the entire point of the related issue) but that means we get all of them.