Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use workqueues for volume expansion #77883

Merged
merged 3 commits into from
May 17, 2019

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented May 14, 2019

Use workqueues for volume expansion. This PR reduces race between volume expansion and other controllers by using resourceVersion match while patching the PVCs.

Fixes #71760 and #71470

This is a rebased and updated version of #75386

None

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 14, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 14, 2019
@gnufied gnufied changed the title {WIP} use workqueues for volume expansion Use workqueues for volume expansion May 14, 2019
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 14, 2019
@gnufied
Copy link
Member Author

gnufied commented May 14, 2019

/kind bug
/priority important-soon

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels May 14, 2019
@gnufied
Copy link
Member Author

gnufied commented May 14, 2019

/sig storage

@gnufied
Copy link
Member Author

gnufied commented May 14, 2019

cc @kubernetes/sig-storage-misc @jsafrane

Add a test for patch creation
@gnufied
Copy link
Member Author

gnufied commented May 14, 2019

/assign @jsafrane

@gnufied
Copy link
Member Author

gnufied commented May 14, 2019

/retest

@jsafrane
Copy link
Member

lgtm-ish, @msau42, WDYT?

@gnufied
Copy link
Member Author

gnufied commented May 16, 2019

/retest

1 similar comment
@gnufied
Copy link
Member Author

gnufied commented May 16, 2019

/retest

pkg/controller/volume/expand/expand_controller.go Outdated Show resolved Hide resolved
}
expc.recorder.Event(pvc, eventType, events.ExternalExpanding, fmt.Sprintf("Ignoring the PVC: %v.", msg))
klog.V(3).Infof("Ignoring the PVC %q (uid: %q) : %v.", util.GetPersistentVolumeClaimQualifiedName(pvc), pvc.UID, msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be level 3? Seems like a configuration problem if we can't find the plugin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any PVC that should be resized externally will hit this code (CSI) and it is not a configuration problem if plugin can't be found. For example - CSI plugin does not actually implement Expandable interface and only implements NodeExpandable interface inside k8s.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to log so high if we know it's an expected normal occurrence?

Copy link
Member Author

@gnufied gnufied May 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but in most clusters I hope resizing itself should relatively be less often than other stuff. Logging this at V(3) gives us opportunity to verify if expand_controller did in fact process the PVC. I think of this the way, we log a successful mount event or attach event, even though those events will happen much more often than resize event - we still log them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Info logs in the operation generator are using either default info level, or 4. Can we follow the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to Infof level.

pkg/volume/util/resize_util.go Outdated Show resolved Hide resolved
return fmt.Errorf("Unexpected error marshaling new PV %q with error : %v", pvClone.Name, err)
}

patchBytes, err := strategicpatch.CreateTwoWayMergePatch(oldData, newData, pvClone)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is resource version in this patch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PV update is not doing resource version check. I think it is fine, because resource version check is important when you don't want to unconditionally patch a field but is less important when your patch does not depend on more recent version of object.

In this case - getting latest version of PV object isn't going to make a difference. Also since PV is non-namespaced object, users should not be editing them.

@msau42
Copy link
Member

msau42 commented May 17, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 17, 2019
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit da4d6f5 into kubernetes:master May 17, 2019
}

oldSize := oldPVC.Spec.Resources.Requests[v1.ResourceStorage]
newPVC, ok := new.(*v1.PersistentVolumeClaim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this (including the check on line 133) should be done before obtaining oldSize.

newSize := newPVC.Spec.Resources.Requests[v1.ResourceStorage]
if newSize.Cmp(oldSize) > 0 {
expc.enqueuePVC(new)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a log saying why the request is ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use workqueue model in expand_controller
7 participants