Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add volume operation metrics to operation executor and PV controller #50036

Merged
merged 1 commit into from
Aug 28, 2017

Conversation

wongma7
Copy link
Contributor

@wongma7 wongma7 commented Aug 2, 2017

This PR implements the proposal for high level volume metrics kubernetes/community#809

Special notes for your reviewer:

Differences from proposal: all resolved

"verify_volume" is now "verify_volumes_are_attached" + "verify_volumes_are_attached_per_node" + "verify_controller_attached_volume." Which of them do we want?

There is no "mount_device" metric because the MountVolume operation combines MountDevice and mount (plugin.Setup). Do we want to extract the mount_device metric or is it okay to keep mountvolume as one? For attachable volumes, MountDevice is the actual mount and Setup is a bindmount + setvolumeownership. For unattachable, mountDevice does not occur and Setup is an actual mount + setvolumeownership.

PV controller metrics I did not implement following the proposal at all. I did not change goroutinemap nor scheduleOperation. Because provisionClaimOperation does not return an error, so it's impossible for the caller to know if there is actually a failure worth reporting. So I manually create a new metric inside the function according to some conditions.

@gnufied

I have tested the operationexecutor metrics but not provision & delete. Sample:
screen shot 2017-08-02 at 15 01 08

Release note:

Add error count and time-taken metrics for storage operations such as mount and attach, per-volume-plugin.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 2, 2017
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Aug 2, 2017
@wongma7
Copy link
Contributor Author

wongma7 commented Aug 2, 2017

@kubernetes/sig-storage-pr-reviews

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Aug 2, 2017
@wongma7 wongma7 force-pushed the metrics branch 5 times, most recently from a5a6a7d to 54a43fe Compare August 2, 2017 20:42
@wongma7
Copy link
Contributor Author

wongma7 commented Aug 3, 2017

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 3, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 3, 2017
@wongma7
Copy link
Contributor Author

wongma7 commented Aug 3, 2017

Have changed it to simply wrap provisioner.Provision and deleter.Delete calls only. This way, we don't include apiserver delays and there is no decision to make about when the operation "actually" begins and ends. ty jsafrane

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 12, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 14, 2017
@wongma7
Copy link
Contributor Author

wongma7 commented Aug 14, 2017

/retest

1 similar comment
@wongma7
Copy link
Contributor Author

wongma7 commented Aug 14, 2017

/retest

)

var StorageOperationMetric = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the range of operations? Would seconds be too coarse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seconds is good imo. mounts and unmounts take tens of seconds. I suspect attach and detach take seconds, though I have not tested it.

package util

import (
"github.com/prometheus/client_golang/prometheus"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move long line to the end of import block.

@@ -124,6 +125,8 @@ type OperationExecutor interface {
func NewOperationExecutor(
operationGenerator OperationGenerator) OperationExecutor {

util.RegisterMetrics()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since registering same metrics multiple times is fatal error (will cause program to crash). How about we move the registration part to Init hook of util/metrics itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay. i missed that. I would prefer that - we do this via init hook, because that wouldn't require sync.Once trick. We will have to probably put metrics code in its own package for clarity in that case, since if a package contains multiple init hooks their execution can be wonky.

Is there a reason - you didn't want to pull registration in that file itself? let me know, if I am missing something. :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied others, it looks like a sort of unspoken convention to use sync to prevent multiple registration https://github.com/kubernetes/kubernetes/search?utf8=%E2%9C%93&q=sync.once+registermetrics&type=

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The volume operation metrics in such a place that I think, there is no good one place to register them except the util/metrics.go file itself. In many cases - we have well defined set of places where registered metrics will be used. Such as - we know that aws metrics will be used in AWS package after initialization.

We don't have such clear cut isolation in case of volume metrics. So while I understand the precedent of using sync.Once - I do not think we need to bent backwards to use it here. We can get away by simply registering the metrics in util/metrics package itself. It would be clean and DRY - wouldn't require sync.Once. Is there a downside of registering the metrics in package itself that I am missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm okay, I don't see a big downside either way but will change it to init lest I be accused of cargo culting. it's clear (for now), that operation_executor and pv controller are sole users of these metrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, PTAL at latest commit, I'll squash

@@ -648,16 +656,18 @@ func (oe *operationExecutor) VerifyVolumesAreAttachedPerNode(
if err != nil {
return err
}

opCompleteFunc := util.OperationCompleteHook("n/a", "verify_volumes_are_attached_per_node")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be "<n/a>" right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean it should have the '<' and '>'? will fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is exactly what I meant.

@wongma7 wongma7 force-pushed the metrics branch 2 times, most recently from e99e70c to 1bf652a Compare August 23, 2017 14:57
@k8s-github-robot k8s-github-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Aug 23, 2017
@jsafrane
Copy link
Member

provisionClaimOperation does not return error because it was useless, the controller re-tries anyway regardless if there was error or not. It can return error, nothing should break.

@wongma7
Copy link
Contributor Author

wongma7 commented Aug 24, 2017

/assign @gnufied

@gnufied
Copy link
Member

gnufied commented Aug 24, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 24, 2017
@gnufied
Copy link
Member

gnufied commented Aug 24, 2017

@saad-ali @brancz can you guys have a look as well before we merge this?

@gnufied
Copy link
Member

gnufied commented Aug 24, 2017

@wongma7 can you change PR description, so as people reading this PR doesn't feel that we have some outstanding issues?

@wongma7
Copy link
Contributor Author

wongma7 commented Aug 25, 2017

/assign @jsafrane
@jsafrane please /approve, thank you:)

@brancz
Copy link
Member

brancz commented Aug 28, 2017

I don't know the surrounding code base to have the full picture, but metrics wise lgtm.

@jsafrane
Copy link
Member

/approve

@jsafrane
Copy link
Member

/approve no-issue

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied, jsafrane, wongma7

Associated issue requirement bypassed by: jsafrane

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 28, 2017
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@verult
Copy link
Contributor

verult commented Oct 20, 2017

k8s-github-robot pushed a commit to kubernetes/community that referenced this pull request Oct 20, 2017
Automatic merge from submit-queue.

Update volume operations metric names as implemented

@verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
justaugustus pushed a commit to justaugustus/enhancements that referenced this pull request Sep 3, 2018
Automatic merge from submit-queue.

Update volume operations metric names as implemented

@verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
MadhavJivrajani pushed a commit to kubernetes/design-proposals-archive that referenced this pull request Nov 30, 2021
Automatic merge from submit-queue.

Update volume operations metric names as implemented

@verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
MadhavJivrajani pushed a commit to MadhavJivrajani/design-proposals that referenced this pull request Dec 1, 2021
Automatic merge from submit-queue.

Update volume operations metric names as implemented

@verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
MadhavJivrajani pushed a commit to MadhavJivrajani/design-proposals that referenced this pull request Dec 1, 2021
Automatic merge from submit-queue.

Update volume operations metric names as implemented

@verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
MadhavJivrajani pushed a commit to MadhavJivrajani/design-proposals that referenced this pull request Dec 1, 2021
Automatic merge from submit-queue.

Update volume operations metric names as implemented

@verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
MadhavJivrajani pushed a commit to kubernetes/design-proposals-archive that referenced this pull request Dec 1, 2021
Automatic merge from submit-queue.

Update volume operations metric names as implemented

@verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
MadhavJivrajani pushed a commit to kubernetes/design-proposals-archive that referenced this pull request Dec 1, 2021
Automatic merge from submit-queue.

Update volume operations metric names as implemented

@verult PTAL, thanks! kubernetes/kubernetes#50036 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants