-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kanister prometheus metrics: Controller Integration + ActionSet resolution counter exports #2247
Conversation
Thanks for submitting this pull request 🎉. The team will review it soon and get back to you. If you haven't already, please take a moment to review our project contributing guideline and Code of Conduct document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
left some minor comments.
9b3a423
to
2ac8dd8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I have approved, we can wait for @pavannd1 to have a look as well. |
0d23528
to
743d6ab
Compare
@@ -435,17 +448,20 @@ func (c *Controller) runAction(ctx context.Context, t *tomb.Tomb, as *crv1alpha1 | |||
c.logAndSuccessEvent(ctx, fmt.Sprintf("Executing action %s", action.Name), "Started Action", as) | |||
tp, err := param.New(ctx, c.clientset, c.dynClient, c.crClient, c.osClient, action) | |||
if err != nil { | |||
c.incrementActionSetResolutionCounterVec(ACTION_SET_COUNTER_VEC_LABEL_RES_FAILURE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remind me what the issue was with incrementing this in the caller of runAction
i.e., handleActionSet
inside the if
block on line 421?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pavannd1 There is no issue per se with respect to this PR. Considering that we do move the Inc()
calls to line 421 now, and we decide to pair the resolution
label dimension with another dimension, say action_type
in future PRs, we would have to move back this Inc()
from line 421 to the callee, because at line 421, we don't have the action_type
information. Hence, I thought that it would be better to maintain the Inc()
calls in the callee to avoid make an additional change when we add action_type
dimension.
I think this would apply to any other label dimension we choose to use with the resolution
dimension.
@@ -630,6 +666,7 @@ func (s *ControllerSuite) TestExecActionSet(c *C) { | |||
if !cancel { | |||
err = s.waitOnActionSetState(c, as, final) | |||
c.Assert(err, IsNil, Commentf("Failed case: %s", tc.name)) | |||
c.Assert(getCounterVecValue(s.ctrl.metrics.actionSetResolutionCounterVec, []string{tc.metricResolution}), Equals, oldValue+1, Commentf("Failed case: %s", tc.name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in the cancel case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're currently maintaining only success
and failure
resolutions as per our discussions. Hence cancel
has essentially been ignored, and we will not be incrementing any values for the cancel case.
If we do decide to add cancel
in the future, then we will have to add a new Inc()
at the point where we figure out that the ActionSet
has been cancelled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit confusing because TestExecActionSet
has been set up to "expect" a failure
in the cancel case, but we aren't asserting anything for now. Let me know if it's better to pass in blank resolution status during the unit test.
0ee8f78
to
f365d37
Compare
f365d37
to
023af87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Change Overview
Following changes made under the kanister prometheus metrics helm feature flag:
-> This PR exports ActionSet resolution counter called
kanister_action_set_resolutions_total
, with "success" and "failure" label values - based on the result of the ActionSet.-> This PR also integrates the Kanister-prometheus framework to the main Kanister controller interface. The integration involves the controller object taking in a prometheus registry and instantiating a metrics object that encapsulates all the Prometheus metric objects for the controller interface.
Pull request type
Please check the type of change your PR introduces:
Issues
Test Plan
TextExecActionSet
unit-test has been modified to ensure that "success" and "failure" counter increments are made successfully on the resolution of an ActionSet.New unit-test called
TestNilPrometheusRegistry
has been added to test the behavior of the controller when a nil prometheus registry is passed and ensure that it does not panic.Manual tests: Multiple action sets were created from the mysql blueprint example and the
/metrics
endpoint has been tested to see if the metrics are correctly exported.Details:
When the a new Kanister operator pod is deployed, the
kanister_action_set_resolutions_total
metric is set to 0 in the/metrics
endpoint:Following the
mysql-blueprints
example from theexamples
directory, 2 ActionSets were created which perform mysql backups, both of which succeeded:kanctl create actionset --action backup --namespace kanister --blueprint mysql-blueprint --statefulset mysql-test/mysql-release --profile mysql-test/<profile-name> --secrets mysql=mysql-test/mysql-release
Following this, the
/metrics
displays the following counter value: