-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design to add support for Multiple VolumeSnapshotClasses in CSI Plugin #5774
Conversation
74abf2b
to
dc7e6db
Compare
Codecov Report
@@ Coverage Diff @@
## main #5774 +/- ##
==========================================
- Coverage 40.67% 39.96% -0.71%
==========================================
Files 243 256 +13
Lines 21035 23037 +2002
==========================================
+ Hits 8555 9206 +651
- Misses 11857 13139 +1282
- Partials 623 692 +69 see 77 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
For now, just a couple high level comments. I think we should not consider the explicit csi-specific fields on the Backup plugin. As long as the CSI plugin is implemented as an external plugin, we really don't want to embed specific knowledge of the plugin's internals in the core velero APIs. Of the other two approaches, the annotation one is the most straightforward. If we go that way, we may want to allow annotations on both PVCs and Backup/Schedule CRs. That way a user can set one policy for the backup as a whole, allowing overrides per PVC. If we only do one of the two, probably the Backup level is most useful. The "generic property bag" is the most interesting option, as it would allow other plugins to also add their own configuration parameters without having to rely on annotations or external config maps. It would be more work, but probably worthwhile if we can identify other existing plugin use cases that would also benefit from this. But the annotation approach is also perfectly acceptable, as there is precedent in using annotations for plugin configuration in other plugins. |
@anshulahuja98 |
"I think we can use the annotation on PVC and backup way, or the ConfigMap way. Velero can use some specific label or annotation to find out the correct ConfigMap." @blackpiglet what are your thoughts on introducing a generic Plugin parameter field in the backup/schedule CRD? the issue with annotating configmaps is that they always end up remaining a global setting and hard to get determinism for what is to be used for each backup Introducing another generic field in the CRD can help bring in that determinism and be extensible to other plugins where similar disambiguation of global settings is required. Annotations are not a very friendly way from a regular customer POV. I believe we can't even use velero CLI for such scenarios. We discussed this in the last Community meeting - there seemed to be general consensus over the generic plugin param approach |
@anshulahuja98 |
@blackpiglet I completely understand your point of the amount of inputs to be very high and being created as a separte configmap. keeping the advanced configuration separate from CRD I also agree with. The point I am trying to drive here is just that, the reference to that configmap instead of through annotations, we pass it in a generic input field. For example: apiVersion: velero.io/v1
kind: Backup
metadata:
name: backup-1
spec:
pluginInputs:
- name: velero.io/advanced-filter
- properties:
- key: advanceFilterConfigMapReference
- value: configMapNamespace/configMapName In above example, the advance filter code will know which value to look for in the pluginInputs field, whicch key to search on to get the configmap reference. Similarly if let's say the VolumeSNapshotClassInputs increase in size we can pass reference to configmap. apiVersion: velero.io/v1
kind: Backup
metadata:
name: backup-1
spec:
pluginInputs:
- name : velero.io/csi
- properties:
- key: snapshotClassMappingConfigMapReference
- value: configMapNamespace/configMapName This can further be extended to pass in StorageClass mapping configmap Plugin inputs is a generic field which can maps a "PluginName" to a map[string]string which can hold plugin specific params. Now this can be specific parameters in case the payload is small. BUt in advanced cases we can just pass references to other objects such as configmaps/secrets |
@anshulahuja98 |
I agree with your point @blackpiglet , we can do both annotations and CRD way. WIll keep CRD route to be P0, and annotations P1, should not take too much effort to light up annotations as well. We anyways need annotations if we wish to support VSC support at per PVC level. For overrides of global settings. Thank you for your comments and initial review! |
Meanwhile are there any thoughts on how to expose the input of such "plugin inputs" through the CLI? |
CC @reasonerjt / @qiuming-best for review. |
Have updated the PR with the final design based on inputs. |
Please add a changelog file to this PR, and fix the typo to make the Github action check to pass. For the "plugin inputs" setting part, the Velero team had some discussion. Since there is no procedure for the Velero CRD update, we don't prefer to modify the Velero CRDs in v1.11. Because, if the CRD is changed, after the Velero version upgrade, the Velero CRD is not changed accordingly, it will complain that fails to find the updated CRD in the cluster. |
@blackpiglet Major version should be enough to push through a CR update, I see that has happened often in the past. Is there anything specific about 1.11 due to which we don't wish to change them? Also does this mean that PRs such as #5838 also blocked for 1.11 since they have CRD changes? For steps forward do you wish for this design to be checked in as is and worked upon post 1.11. Or come up with alternative proposal? |
While the new resource filtering mentioned above (with CRD changes) won't make it into 1.11, there will be CRD changes for the new async BIA/RIA plugin operations, some of which are already merged on main, so I don't think that "we don't want CRD changes for 1.11" would really apply here. For z streams (1.10.1, etc.), we definitely want to minimize, hopefully completely eliminate CRD changes, but for releases like 1.8, 1.9, 1.10, 1.11, etc. some CRD changes are always expected. The main concern is making sure that the changes are backwards-compatible, especially with regard to existing backups/PVBs, etc. |
Right. Although it's better to limit the CRD change, I also found it hard to avoid CRD change, so the CRD change is not a blocker of the PR. For the |
@shubham-pampattiwar can you please review this PR since you have more CSI related context. |
Signed-off-by: Anshul Ahuja <anshulahuja@microsoft.com>
9c75929
to
7bf5b50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anshulahuja98 Thanks for the design, added a few comments.
### Annotations overrides on PVC for CSI Plugin | ||
The user can annotate the PVCs with VolumeSnapshotClass name. This will override whatever the user has passed in pluginInputs for that driver. | ||
|
||
- If annotation is not present or VolumeSnapshotClass referred is not present in cluster OR if the specified VSC does not have the same CSI driver as the PVC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on the proposed behaviour
CLI Example | ||
|
||
```bash | ||
velero backup create my-backup --plugin-inputs velero.io/csi:csi.cloud.disk.driver=csi-diskdriver-snapclass,csi.cloud.file.driver=csi-filedriver-snapclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a note so that we update docs on how we are using the delimiters for the pluginsInputs
in velero CLI
|
||
### Using Plugin Inputs for CSI Plugin | ||
The user can specify the VolumeSnapshotClass to use for a particular driver and backup using the plugin inputs. The CSI plugin will use the VolumeSnapshotClass specified in the plugin inputs. If the VolumeSnapshotClass is not specified for a driver, the CSI plugin will use the default VolumeSnapshotClass for the driver fetched using labels through older route. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the main downside of this approach would be that we would be increasing the size of the backup CR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What all concerns emerge from this? Have we seen any issues earlier due to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not from the CR size, but the Restore CRD size was big before, and in some low-bandwidth environments, Velero installation fails due to API-server connection timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see..
We could consider reducing the comments/ description per CRD field as well in those cases.
name: backup-1 | ||
spec: | ||
pluginInputs: | ||
- name: velero.io/csi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder would it make sense to add another spec called type
(plugintype) along the level of name
spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here I have used name for the plugin Name
Isn't plugin type same as plugin name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through the code, do you mean PluginKind (ObjectStore/ VolumeSnapshotter etc)
Would make the customer experience a bit more complicated is what I feel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shubham-pampattiwar I'm thinking that we're dealing with limited types here, so instead of a type field, lets have a single spec field for backup item actions. If we need to support this for volume snapshotters (and we may not, since there's not really a lot of active development there) we can add a separate field for that later. Velero already calls BIA and VolumeSnapshotter plugins in completely different places, so it probably makes more sense to separate the configuration fields too.
name: backup-1 | ||
spec: | ||
pluginInputs: | ||
- name: velero.io/csi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an implementation detail, but we should add a backup validation as well when pluginInputs are specified like they should only be accepted only if the corresponding plugin is registered with the velero controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Will check feasibility of such check in code and add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code, the plugins today seem to be queryable using the PluginKind (List for ObjectStore/ VolumeSnapshotter etc).
Querying purely based on name does not seem to be available.
most likely because the same plugin name may hold for snapshotter and object store.
So for adding this validation, we might need to take PluginKind as input. Which is slightly a bad end user experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shubham-pampattiwar @anshulahuja98 Actually, I'm not 100% sure we need that validation. Should backups with plugin inputs fail if the plugin isn't registered? In any case, if you want this validation you'll need PluginKind. clientmgmt.Manager
does have methods for retrieving registered plugins by name, but there is no grand unified list of plugins. Each PluginKind has a separate list of registered plugins. However, since we're talking about plgins for a backup or for a restore) here, we should already know the plugin kind. We're only dealing with BackupItemAction plugin inputs here. We may want to eventually support volumesnapshotter inputs too, but maybe that's a separate spec section. Call this biaPluginInputs
instead of pluginInputs
(and riaPluginInputs
on restore), and then we already know the plugin kind. It would be BackupItemAction
in the context of velero 1.10, but since this won't be implemented until post-1.11, it would be BackupItemActionV2
Basically, validation could call pluginManager.GetBackupItemActionV2(input.Name)
and if it errors out, then the plugin isn't registered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to evaluate if we really need validation
We can stage it if needed.
Signed-off-by: Anshul Ahuja <anshulahuja@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the design/use-case looks sane to me but I would like to know thoughts from other maintainers as well ?
@anshulahuja98 There might be some intersection with the ongoing design regarding Volume Policies (#5773)
I am just spit-balling here but can we not integrate your use case with the Volume Policies design ? like the Condition
would be the CSI driver and Action.Type
could be Override
and Action.Parameters
could consist of the VolumeSnapshotClass that you want to use for those volumes.
(cc @blackpiglet @Lyndon-Li @sseago @reasonerjt @qiuming-best )
I understand where you are coming from on the intersection. The volumefilter approach seems very involved for a customer to setup as compared to having dedicated entries for CSI VSCs. However I feel that the VOlume Policy Design could benefit from the pluginInputs field that I am proposing. THey could leverage the same. |
name: backup-1 | ||
spec: | ||
pluginInputs: | ||
- name: velero.io/csi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's right thing to do to expose plugin name to the end user, that couples the interface to the implementation.
For example, if we choose to move CSI plugin into the main backup flow (not likely to happen though), it will be confusing if we continuing letting user select the vsclass via plugin input.
I would rather add a CSI
field into the spec of backup and let the user choose it explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the point of "couples the interface to the implementation.", even today customer has to understand which plugins to enable to add the appropriate plugin containers to the deployment. User is already in some sense expected to be aware of which component is providing which functionality.
If CSI is merged with current codebase, we would have to anyways educated brownfield customers to migrate from using the CSI plugin containers in their deployments. This change is much bigger than just honoring / not honoring the values in the pluginInputs.
The overall purpose of this spec is to introduce pluginInputs which can be leverage by other plugins without having to bloat up the backup CR, this approach is mostly plugin agnostic. CSI related functionalities are just a part of the requirement.
For CSI you might agree on backup spec, but I don't think for features such as storageclass mapping this approach would make sense.
Any suggestions as to how those scenarios can be handled?
In the alternatives consdiered below you can check other approaches we evaluated.
Introducing CSI specific fields in backup CR turned to be strong no from discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall purpose of this spec is to introduce pluginInputs which can be leverage by other plugins without having to bloat up the backup CR
I agree we don't want to bloat up the backup CR, that's why we want a balance between customizability and complexity. At this moment, I'm still not quite convinced we should provide a general framework so that users can customize the behavior of any plugin.
For CSI you might agree on backup spec, but I don't think for features such as storageclass mapping this approach would make sense.
Any suggestions as to how those scenarios can be handled?
As for storageclass mapping
, if you are talking about modifying the resources before they are restored, user can write RIA, if we are not happy with RIA, I think the JSON-substitution proposal may be helpful, though it may not have to be an RIA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"I agree we don't want to bloat up the backup CR, that's why we want a balance between customizability and complexity. At this moment, I'm still not quite convinced we should provide a general framework so that users can customize the behavior of any plugin."
Users are anyways required to do this through annotations for certain plugins - the main issue is that often these are often set as global behaviour impacting all backup/restores. Additionally annotations are not a user friendly approach to this.
For storage class mapping the existing RIA is dependent on global behavior. It fetches configmap on basis of certain label / annotation. The same issue we are facing in VolumeSnapshotClass selection.
Expecting the end user to write plugins for each scenario, understand velero code, maintaining the RIA code seems like a very big overhead.
Signed-off-by: Anshul Ahuja <anshulahuja@microsoft.com>
metadata: | ||
name: backup-1 | ||
annotations: | ||
velero.io/csi-volumesnapshot-class/csi.cloud.disk.driver: csi-diskdriver-snapclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be a mismatch between the driver name in the annotation and the one in the actual vsclass, in that case the plugin will also fallback to the default one, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct.
Refer section Alternatives 2. **Through generic property bag in the velero contracts**: in the design doc for more details on the pluginInputs field. | ||
|
||
|
||
## Alternatives Considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is to extend the resource policy:
https://github.com/vmware-tanzu/velero/blob/main/design/handle-backup-of-volumes-by-resources-filters.md
So it matches certain PVCs and pass the vsclass as parameters of action csi-snapshot
But to avoid confusion I think annotations may be good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe CSI snapshotting is a very common use case as compared to resource filters for volumes which are more specific in nature of their purpose.
And hence from a better customer experience POV, I'll prefer an annotations approach.
For performing basic snapshotting operations, user should not have to create and manage configmaps.
|
||
To query the annotations on a backup: "velero.io/csi-volumesnapshot-class/'driver name'" - where driver names comes from the PVC's driver. | ||
|
||
2. **Support VolumeSnapshotClass selection at PVC level** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this implies velero should always use the same vsclass to take snapshot of PVCxxx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, consider this an override behavior for a PVC.
In case user has configured VSC at backup level, this will override the VSC for the particular PVC.
I am not sure if you this comment something else?
@reasonerjt can you please add this to the 1.12 milestone. |
Coming a bit late to the party. I would really appreciate it if someone can summarize the change planned for 1.12 as far as this feature is concerned. Has it been decided to implement annotations at both backup and PVC level? I did try to go through all the comments but it is not easy to process as there are so many of them. Just for reference, I would like to highlight few points:
|
|
The PR basically looks good to me. Need other reviewers' opinions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
Thank you for approving the PR folks |
Is it testable in v1.12 RC1? |
yes @achyut-deshpande you can use v0.6.0-rc.1 release of CSI plugin to test out this functionality. Please use it in conjunction with 1.12 RC1 of core velero. |
Thanks, however This annotation format in backup-cr is not working The Backup "backup3" is invalid: metadata.annotations: Invalid value: "velero.io/csi-volumesnapshot-class/csi.cloud.disk.driver": a qualified name must consist of alphanumeric characters, '-', '_' or '.' |
Hi @achyut-deshpande we fixed this issue later
use an "_" instead of the "/" |
I checked CSI section of Velero docs and it still mentions single volume snapshot class. Is there a docs PR for this feature? |
@draghuram missed the docs PR due to time crunch. |
CSI README was already updated with basic docs, just need the bug fix |
Thank you for contributing to Velero!
Please add a summary of your change
Does your change fix a particular issue?
Proposal for #5750
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.