-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add volume operation metrics to operation executor and PV controller #50036
Conversation
@kubernetes/sig-storage-pr-reviews |
a5a6a7d
to
54a43fe
Compare
Have changed it to simply wrap provisioner.Provision and deleter.Delete calls only. This way, we don't include apiserver delays and there is no decision to make about when the operation "actually" begins and ends. ty jsafrane |
/retest |
1 similar comment
/retest |
pkg/volume/util/metrics.go
Outdated
) | ||
|
||
var StorageOperationMetric = prometheus.NewHistogramVec( | ||
prometheus.HistogramOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the range of operations? Would seconds be too coarse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seconds is good imo. mounts and unmounts take tens of seconds. I suspect attach and detach take seconds, though I have not tested it.
pkg/volume/util/metrics.go
Outdated
package util | ||
|
||
import ( | ||
"github.com/prometheus/client_golang/prometheus" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move long line to the end of import block.
@@ -124,6 +125,8 @@ type OperationExecutor interface { | |||
func NewOperationExecutor( | |||
operationGenerator OperationGenerator) OperationExecutor { | |||
|
|||
util.RegisterMetrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since registering same metrics multiple times is fatal error (will cause program to crash). How about we move the registration part to Init
hook of util/metrics
itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the sync.once
thing will keep us safe https://github.com/kubernetes/kubernetes/pull/50036/files#diff-7c5ec7e6067fba16973d1249a87653a8R42
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. i missed that. I would prefer that - we do this via init
hook, because that wouldn't require sync.Once
trick. We will have to probably put metrics code in its own package for clarity in that case, since if a package contains multiple init
hooks their execution can be wonky.
Is there a reason - you didn't want to pull registration in that file itself? let me know, if I am missing something. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied others, it looks like a sort of unspoken convention to use sync to prevent multiple registration https://github.com/kubernetes/kubernetes/search?utf8=%E2%9C%93&q=sync.once+registermetrics&type=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The volume operation metrics in such a place that I think, there is no good one place to register them except the util/metrics.go
file itself. In many cases - we have well defined set of places where registered metrics will be used. Such as - we know that aws metrics will be used in AWS package after initialization.
We don't have such clear cut isolation in case of volume metrics. So while I understand the precedent of using sync.Once
- I do not think we need to bent backwards to use it here. We can get away by simply registering the metrics in util/metrics
package itself. It would be clean and DRY - wouldn't require sync.Once
. Is there a downside of registering the metrics in package itself that I am missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm okay, I don't see a big downside either way but will change it to init lest I be accused of cargo culting. it's clear (for now), that operation_executor and pv controller are sole users of these metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, PTAL at latest commit, I'll squash
@@ -648,16 +656,18 @@ func (oe *operationExecutor) VerifyVolumesAreAttachedPerNode( | |||
if err != nil { | |||
return err | |||
} | |||
|
|||
opCompleteFunc := util.OperationCompleteHook("n/a", "verify_volumes_are_attached_per_node") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be "<n/a>" right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean it should have the '<' and '>'? will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is exactly what I meant.
e99e70c
to
1bf652a
Compare
|
/assign @gnufied |
/lgtm |
@wongma7 can you change PR description, so as people reading this PR doesn't feel that we have some outstanding issues? |
I don't know the surrounding code base to have the full picture, but metrics wise lgtm. |
/approve |
/approve no-issue |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gnufied, jsafrane, wongma7 Associated issue requirement bypassed by: jsafrane The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
Please update the operation names here :) |
Automatic merge from submit-queue. Update volume operations metric names as implemented @verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
Automatic merge from submit-queue. Update volume operations metric names as implemented @verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
Automatic merge from submit-queue. Update volume operations metric names as implemented @verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
Automatic merge from submit-queue. Update volume operations metric names as implemented @verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
Automatic merge from submit-queue. Update volume operations metric names as implemented @verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
Automatic merge from submit-queue. Update volume operations metric names as implemented @verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
Automatic merge from submit-queue. Update volume operations metric names as implemented @verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
Automatic merge from submit-queue. Update volume operations metric names as implemented @verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
This PR implements the proposal for high level volume metrics kubernetes/community#809
Special notes for your reviewer:
Differences from proposal:all resolved"verify_volume" is now "verify_volumes_are_attached" + "verify_volumes_are_attached_per_node" + "verify_controller_attached_volume." Which of them do we want?There is no "mount_device" metric because the MountVolume operation combines MountDevice and mount (plugin.Setup). Do we want to extract the mount_device metric or is it okay to keep mountvolume as one? For attachable volumes, MountDevice is the actual mount and Setup is a bindmount + setvolumeownership. For unattachable, mountDevice does not occur and Setup is an actual mount + setvolumeownership.PV controller metrics I did not implement following the proposal at all. I did not change goroutinemap nor scheduleOperation. Because provisionClaimOperation does not return an error, so it's impossible for the caller to know if there is actually a failure worth reporting. So I manually create a new metric inside the function according to some conditions.@gnufied
I have tested the operationexecutor metrics but not provision & delete. Sample:
Release note: