Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow StorageProfile to use a specific VolumeSnapshotClass #2898

Merged

Conversation

arnongilboa
Copy link
Collaborator

@arnongilboa arnongilboa commented Sep 14, 2023

What this PR does / why we need it:
When there are several VolumeSnapshotClasses of the same provisioner but with different parameters, we would like to allow StorageProfile choose VolumeSnapshotClass for its StorageClass, so we won't choose the wrong one.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes bz #2219774

Special notes for your reviewer:

Release note:

Allow StorageProfile to use a specific VolumeSnapshotClass

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Sep 14, 2023
Copy link
Collaborator

@akalenyu akalenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good and makes sense according to the sig storage call,
just some questions

/cc @aglitke
for some more eyes on the approach

@arnongilboa BTW, is the plan to backport this? or could we live with this just in main?

@@ -1803,7 +1803,7 @@ func ValidateSnapshotCloneProvisioners(ctx context.Context, c client.Client, sna
}

// GetSnapshotClassForSmartClone looks up the snapshot class based on the storage class
func GetSnapshotClassForSmartClone(dvName string, targetPvcStorageClassName *string, log logr.Logger, client client.Client) (string, error) {
func GetSnapshotClassForSmartClone(dvName string, targetPvcStorageClassName, snapshotClassName *string, log logr.Logger, client client.Client) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should define something more deterministic for the else { case?
Right now we're just picking the first match, but maybe we can define something like

  • StorageProfile
  • Default cluster snap class
  • first match in sorted slice

Copy link
Collaborator Author

@arnongilboa arnongilboa Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, in the else we can return the default cluster snap class (if there is one) instead of the first match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree but you can only pick the default SnapClass if the provisioner matches. So the order is something like:

  • StorageProfile
  • If there is a single snap class matching the provisioner (easy case): Use single matching snap class
  • If there are multiple matches and there is a default snap class with matching provisioner: Default cluster snap class
  • If there are multiple matches and no matching default snap class: choose first match in sorted slice

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aglitke that's exactly what implemented.

@@ -1803,7 +1803,7 @@ func ValidateSnapshotCloneProvisioners(ctx context.Context, c client.Client, sna
}

// GetSnapshotClassForSmartClone looks up the snapshot class based on the storage class
func GetSnapshotClassForSmartClone(dvName string, targetPvcStorageClassName *string, log logr.Logger, client client.Client) (string, error) {
func GetSnapshotClassForSmartClone(dvName string, targetPvcStorageClassName, snapshotClassName *string, log logr.Logger, client client.Client) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something, or does this need to change too?

// GetCompatibleVolumeSnapshotClass returns a VolumeSnapshotClass name that works for all PVCs
func GetCompatibleVolumeSnapshotClass(ctx context.Context, c client.Client, pvcs ...*corev1.PersistentVolumeClaim) (*string, error) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should, and I guess they should be refactored to reuse the same code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how its more tricky in the populator case, since you probably don't want to start looking up storage profiles here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the updated code. Iterating the pvcs related StorageProfiles in the populator case doesn't sound that tricky, but not sure it's a must when we already added the default support. wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more concerned about the coupling of storage profiles and our populators
@alromeros Did we ever mean for our set of populators to be used standalone?
Or is this fine to look up storage profiles?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alromeros Did we ever mean for our set of populators to be used standalone?

Yeah... so it depends on what we mean by standalone usage. We first merged the populators without DV support and ensured it was well-tested and reliable as a standalone feature. However they'll always be dependent on CDI controllers, so don't know if relying on storage profiles would be against this design decision.

Copy link
Collaborator

@akalenyu akalenyu Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However they'll always be dependent on CDI

Yeah, that is what I thought as well, you still have to have things like the clone-controller for host-assisted right? @mhenriks wdyt?

Copy link
Collaborator

@alromeros alromeros Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we plan to keep depending on the import, upload, and clone controllers. So I guess that if the main rationale behind populators was always to stop depending on DVs (and we are close to achieving that), there's no problem with resorting to storage profiles since they were never an issue to begin with.

pkg/controller/storageprofile-controller.go Show resolved Hide resolved
@kubevirt-bot
Copy link
Contributor

@akalenyu: GitHub didn't allow me to request PR reviews from the following users: the, approach, for, some, more, eyes, on.

Note that only kubevirt members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

PR looks good and makes sense according to the sig storage call,
just some questions

/cc @aglitke for some more eyes on the approach

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

}

logger.Info("Could not match snapshotter with storage class, falling back to host assisted clone")
return "", nil
}

// GetVolumeSnapshotClass looks up the snapshot class based on the driver and an optional specific name
func GetVolumeSnapshotClass(ctx context.Context, c client.Client, driver string, snapshotClassName *string) (*string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we log why we chose a particular snapshot class. This reminds me of the work we are doing to help people understand why we needed to fall back to host-assisted clone.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Member

@aglitke aglitke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks good. I made some comments.

@@ -148,6 +145,23 @@ func GetStorageClassForClaim(ctx context.Context, c client.Client, pvc *corev1.P
return nil, nil
}

func getSnapshotClassForClaim(ctx context.Context, c client.Client, pvc *corev1.PersistentVolumeClaim) (*string, error) {
if pvc.Spec.StorageClassName == nil || *pvc.Spec.StorageClassName == "" {
return nil, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about default storage classes? We have a function that looks up the storage class name of a PVC including default storage class if set.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return nil, err
}
if vsc.Driver == driver {
logger.Info("VolumeSnapshotClass selected according to StorageProfile", "name", vsc.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides logging this in the cdi deployment log, can we emit an event on the target PVC (and maybe even put an annotation on it, so we can report it in a datavolume condition) on which snapshot class we picked and why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choosing a specific VolumeSnapshotClass is a simple good-path logic and a rare use case, so I think logging it is more than enough (unlike the fallback to host-assisted clone where we added pvc event and anotation). @aglitke wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this decision was made based on something other than the PVC/DV spec, I think we should err on the side of over-communicating this decision. I agree with @awels .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the problem here is that we don't have the PVC so it is difficult to communicate this information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we could pass in the name of the PVC, and use that to then look up the PVC and emit the event.

for _, vsc := range vscList.Items {
if vsc.Driver == driver {
if vsc.Annotations[AnnDefaultSnapshotClass] == "true" {
logger.Info("Default VolumeSnapshotClass selected", "name", vsc.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same on emitting event


if len(candidates) > 0 {
sort.Strings(candidates)
logger.Info("First VolumeSnapshotClass selected", "name", candidates[0], "candidates", len(candidates))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same on emitting event

Copy link
Collaborator

@akalenyu akalenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So apparently I lied and default-snapshot-class is a per-provisioner setting!
I don't think it's clear from k8s docs but I stumbled across the code

https://github.com/kubernetes-csi/external-snapshotter/blob/ff71329d8cc08dca53194f8d4568ed53e69ecb82/pkg/common-controller/snapshot_controller.go#L1362-L1374

I guess this makes the situation solvable without any additions?
@arnongilboa @aglitke

@arnongilboa
Copy link
Collaborator Author

So apparently I lied and default-snapshot-class is a per-provisioner setting! I don't think it's clear from k8s docs but I stumbled across the code

https://github.com/kubernetes-csi/external-snapshotter/blob/ff71329d8cc08dca53194f8d4568ed53e69ecb82/pkg/common-controller/snapshot_controller.go#L1362-L1374

I guess this makes the situation solvable without any additions? @arnongilboa @aglitke

@akalenyu but the whole idea (see the bz) was allowing to use different VolumeSnapshotClasses with the same provisioner, so how does default-per-provisioner help here?

@akalenyu
Copy link
Collaborator

akalenyu commented Sep 26, 2023

So apparently I lied and default-snapshot-class is a per-provisioner setting! I don't think it's clear from k8s docs but I stumbled across the code
https://github.com/kubernetes-csi/external-snapshotter/blob/ff71329d8cc08dca53194f8d4568ed53e69ecb82/pkg/common-controller/snapshot_controller.go#L1362-L1374
I guess this makes the situation solvable without any additions? @arnongilboa @aglitke

@akalenyu but the whole idea (see the bz) was allowing to use different VolumeSnapshotClasses with the same provisioner, so how does default-per-provisioner help here?

Right we would still need changes. But we could avoid the storageprofile field and just add respecting for the default snapclass, which they can always change as needed

@arnongilboa
Copy link
Collaborator Author

So apparently I lied and default-snapshot-class is a per-provisioner setting! I don't think it's clear from k8s docs but I stumbled across the code
https://github.com/kubernetes-csi/external-snapshotter/blob/ff71329d8cc08dca53194f8d4568ed53e69ecb82/pkg/common-controller/snapshot_controller.go#L1362-L1374
I guess this makes the situation solvable without any additions? @arnongilboa @aglitke

@akalenyu but the whole idea (see the bz) was allowing to use different VolumeSnapshotClasses with the same provisioner, so how does default-per-provisioner help here?

Right we would still need changes. But we could avoid the storageprofile field and just add respecting for the default snapclass, which they can always change as needed

but the user wants to use two different VolumeSnapshotClasses with the same provisioner in parallel, so patching the default with the desired value per PVC creation seems like a an ugly hack.

@akalenyu
Copy link
Collaborator

So apparently I lied and default-snapshot-class is a per-provisioner setting! I don't think it's clear from k8s docs but I stumbled across the code
https://github.com/kubernetes-csi/external-snapshotter/blob/ff71329d8cc08dca53194f8d4568ed53e69ecb82/pkg/common-controller/snapshot_controller.go#L1362-L1374
I guess this makes the situation solvable without any additions? @arnongilboa @aglitke

@akalenyu but the whole idea (see the bz) was allowing to use different VolumeSnapshotClasses with the same provisioner, so how does default-per-provisioner help here?

Right we would still need changes. But we could avoid the storageprofile field and just add respecting for the default snapclass, which they can always change as needed

but the user wants to use two different VolumeSnapshotClasses with the same provisioner in parallel, so patching the default with the desired value per PVC creation seems like a an ugly hack.

Yep, in that case we still need a storage profile field, it's too ugly to keep patching them

Copy link
Member

@awels awels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good minus the issue about emitting an event.

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2023
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
in GetCompatibleVolumeSnapshotClass

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2023
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
@arnongilboa
Copy link
Collaborator Author

/retest

1 similar comment
@arnongilboa
Copy link
Collaborator Author

/retest

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 6, 2023
@awels
Copy link
Member

awels commented Nov 6, 2023

/test pull-containerized-data-importer-e2e-ceph-wffc

1 similar comment
@arnongilboa
Copy link
Collaborator Author

/test pull-containerized-data-importer-e2e-ceph-wffc

Copy link
Collaborator

@akalenyu akalenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just mainly concerned about the loop var pointer being a recipe for trouble

installerLabels: installerLabels,
}

storageProfileController, err := controller.New(
"storageprofile-controller",
dataImportControllerName,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataImportControllerName?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch:)

if vsc.Driver == driver {
if vsc.Annotations[AnnDefaultSnapshotClass] == "true" {
logEvent(MessageDefaultVolumeSnapshotClassSelected, vsc.Name)
return &vsc.Name, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually pointers to loop vars are an endless path of pain and sadness https://go.dev/blog/loopvar-preview

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

pkg/controller/common/util.go Show resolved Hide resolved
pkg/controller/common/util.go Show resolved Hide resolved
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2023
@akalenyu
Copy link
Collaborator

akalenyu commented Nov 7, 2023

/approve
/test pull-cdi-apidocs
/cc @awels

@kubevirt-bot
Copy link
Contributor

@akalenyu: GitHub didn't allow me to request PR reviews from the following users: for, lgtm.

Note that only kubevirt members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/approve
/test pull-cdi-apidocs
/cc @awels for lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akalenyu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 7, 2023
@awels
Copy link
Member

awels commented Nov 7, 2023

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2023
@akalenyu
Copy link
Collaborator

akalenyu commented Nov 7, 2023

/test pull-containerized-data-importer-e2e-hpp-previous

@arnongilboa
Copy link
Collaborator Author

/retest

@arnongilboa
Copy link
Collaborator Author

/test pull-containerized-data-importer-e2e-hpp-latest

@arnongilboa
Copy link
Collaborator Author

/test pull-containerized-data-importer-e2e-ceph-wffc

@awels
Copy link
Member

awels commented Nov 8, 2023

/cherrypick release-v1.58

@kubevirt-bot
Copy link
Contributor

@awels: once the present PR merges, I will cherry-pick it on top of release-v1.58 in a new PR and assign it to you.

In response to this:

/cherrypick release-v1.58

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@akalenyu
Copy link
Collaborator

akalenyu commented Nov 9, 2023

/retest

@akalenyu
Copy link
Collaborator

akalenyu commented Nov 9, 2023

/test pull-containerized-data-importer-e2e-ceph

@kubevirt-bot
Copy link
Contributor

@arnongilboa: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-containerized-data-importer-e2e-ceph 5386a44 link unknown /test pull-containerized-data-importer-e2e-ceph
pull-containerized-data-importer-e2e-hpp-latest 5386a44 link true /test pull-containerized-data-importer-e2e-hpp-latest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@akalenyu
Copy link
Collaborator

akalenyu commented Nov 9, 2023

/test pull-containerized-data-importer-e2e-hpp-latest

@kubevirt-bot kubevirt-bot merged commit 1a04ba9 into kubevirt:main Nov 9, 2023
19 of 21 checks passed
@kubevirt-bot
Copy link
Contributor

@awels: new pull request created: #2974

In response to this:

/cherrypick release-v1.58

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants