Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for cdi populators #2776

Merged
merged 2 commits into from
Jul 11, 2023

Conversation

ShellyKa13
Copy link
Contributor

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:
The documentation already takes into account all the code that should get in for the cdi populators i.e the integration with the datavolume clone and populators #2750 , supporting Immediate Binding annotation #2765 , supporting VDDK and Imagio sources #2767

Release note:

Add documentation for cdi populators

Signed-off-by: Shelly Kagan <skagan@redhat.com>
@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 27, 2023
CDI volume populators are CDIs implementation of populating PVCs by importing/uploading/cloning data utilizing the new `dataSourceRef` field. New controllers and custom resources for each population method were introduced.

**Values of the new API?**
* Native synchronization with Kubernetes - this is kubernetes way of populating PVCs. Once PVC is bound we know it is populated (So far PVC was bound the moment the datavolume created it and the population progress was monitored via the datavolume)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the PVC was bound the moment the datavolume created it, and the population progress was monitored via the datavolume

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before introducing populators, the PVC...

* Can use one population definition for multiple PVCs - create 1 CR defining population source and use it for any PVC.
* Better compatibility with existing backup solutions. Using PVC alone should solve all backup issues. Using datavolumes with populators solves most, for example Metro DR and [Gitops](https://www.redhat.com/en/topics/devops/what-is-gitops#:~:text=GitOps%20uses%20Git%20repositories%20as,set%20for%20the%20application%20framework.) - datavolume manifest will be applied, the datavolume will create the PVC that will bind immediately to the PV waiting for it.
* Better compatibility with integration with [Kubevirt](https://github.com/kubevirt/kubevirt) and existing backup solutions, using VMs with PVCs using populators and VMs with datavolumetemplates.
* Integration with [Kubevirt](https://github.com/kubevirt/kubevirt) with WFFC storage class is simpler not requiring [doppelganger pod](https://github.com/kubevirt/kubevirt/blob/main/docs/localstorage-disks.md#the-problem) for the start of the VM.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe expand WFFC into WaitForFirstConsumer(WFFC) so it is clear what the acronym means

Once the temporary PVC population is done, the PV will be rebound to the original PVC completing the population process.

#### Upload
For upload need to follow the same guidelines as describe in the [upload doc](upload.md) but instead of creating a data volume you can create VolumeUploadSource CR and PVC similar to the import examples in this doc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would give a full example of upload as well, since you are already doing it for import and clone.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree


**Values of the new API?**
* Native synchronization with Kubernetes - this is kubernetes way of populating PVCs. Once PVC is bound we know it is populated (So far PVC was bound the moment the datavolume created it and the population progress was monitored via the datavolume)
* Use PVCs directly and get them populated without datavolumes mitigation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe mitigation -> involvement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually like the word I intended to use, which apparently is not mitigation but mediation

* Use PVCs directly and get them populated without datavolumes mitigation.
* Can use one population definition for multiple PVCs - create 1 CR defining population source and use it for any PVC.
* Better compatibility with existing backup solutions. Using PVC alone should solve all backup issues. Using datavolumes with populators solves most, for example Metro DR and [Gitops](https://www.redhat.com/en/topics/devops/what-is-gitops#:~:text=GitOps%20uses%20Git%20repositories%20as,set%20for%20the%20application%20framework.) - datavolume manifest will be applied, the datavolume will create the PVC that will bind immediately to the PV waiting for it.
* Better compatibility with integration with [Kubevirt](https://github.com/kubevirt/kubevirt) and existing backup solutions, using VMs with PVCs using populators and VMs with datavolumetemplates.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compatibility with -> and?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not exactly right but I will rephrase

Copy link
Collaborator

@alromeros alromeros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this, looks good! Some comments.

For upload need to follow the same guidelines as describe in the [upload doc](upload.md) but instead of creating a data volume you can create VolumeUploadSource CR and PVC similar to the import examples in this doc.

#### Clone
Same for clone very similar API:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sound odd, maybe Example of a PVC that will be populated by clone-populator:

Once the temporary PVC population is done, the PV will be rebound to the original PVC completing the population process.

#### Upload
For upload need to follow the same guidelines as describe in the [upload doc](upload.md) but instead of creating a data volume you can create VolumeUploadSource CR and PVC similar to the import examples in this doc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

What are [populators](https://kubernetes.io/blog/2022/05/16/volume-populators-beta/)

## Introduction
CDI volume populators are CDIs implementation of populating PVCs by importing/uploading/cloning data utilizing the new `dataSourceRef` field. New controllers and custom resources for each population method were introduced.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if this is incorrect but my spanish-talking brain tells me to change populating for a noun in implementation of populating PVCs.

Maybe CDI volume populators are CDI's implementation for the population of PVCs by... sounds good too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me it sounds fine :) I prefer leave it like this unless someone says its wrong

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or "CDI volume populators are CDI's implementation of the existing import, upload, and clone operations using the new dataSourceRef field of the PVC."


The integration of datavolumes and CDI populators is seamless. You can create the datavolumes the same way you always have.
If the used storage class is CSI storage then the datavolume population will occur via the CDI populators with the end result as always has been of a populated PVC. You will be able to notice that the created PVC will stay pending until the population process completes.
For more information of using datavolumes for population check the [datavolume doc](datavolumes.md)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a strict list of requirements for populators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after all of the PRs I stated in the description there will be no requirements other then the CSI storage which I mentioned.

## Introduction
CDI volume populators are CDIs implementation of populating PVCs by importing/uploading/cloning data utilizing the new `dataSourceRef` field. New controllers and custom resources for each population method were introduced.

**Values of the new API?**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ?? I would rename this to something like Motivation or Benefits of using populators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) I started with why using the new API and changed it. I'll remove the ? and maybe change values to benefits

### Using populators with PVCs
User can create a CR and PVCs specifying the CR in the `DataSourceRef` field and those will be handled by the matching populator controller.

#### Import
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be more specific with each example: We should add an example of a valid CRD and a valid PVC for each populator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have it for every example except upload which Ill add

What are [populators](https://kubernetes.io/blog/2022/05/16/volume-populators-beta/)

## Introduction
CDI volume populators are CDIs implementation of populating PVCs by importing/uploading/cloning data utilizing the new `dataSourceRef` field. New controllers and custom resources for each population method were introduced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or "CDI volume populators are CDI's implementation of the existing import, upload, and clone operations using the new dataSourceRef field of the PVC."


**Benefits of the new API**
* Native synchronization with Kubernetes - this is Kubernetes way of populating PVCs. Once PVC is bound we know it is populated (Before introducing populators, the PVC was bound the moment the datavolume created it, and the population progress was monitored via the datavolume)
* Use PVCs directly and get them populated without datavolumes mediation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now possible to use PVCs directly and have them populated without the need for DataVolumes.

**Benefits of the new API**
* Native synchronization with Kubernetes - this is Kubernetes way of populating PVCs. Once PVC is bound we know it is populated (Before introducing populators, the PVC was bound the moment the datavolume created it, and the population progress was monitored via the datavolume)
* Use PVCs directly and get them populated without datavolumes mediation.
* Can use one population definition for multiple PVCs - create 1 CR defining population source and use it for any PVC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to mention that this is limited to a single namespace until https://kubernetes.io/blog/2023/01/02/cross-namespace-data-sources-alpha/ goes beta and is incorporated into CDI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhenriks didn't you solve that for now by using the tokens or I miss understood?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross namespace only works with datavolume integration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have to mention this limitation since currently alpha

* Native synchronization with Kubernetes - this is Kubernetes way of populating PVCs. Once PVC is bound we know it is populated (Before introducing populators, the PVC was bound the moment the datavolume created it, and the population progress was monitored via the datavolume)
* Use PVCs directly and get them populated without datavolumes mediation.
* Can use one population definition for multiple PVCs - create 1 CR defining population source and use it for any PVC.
* Better compatibility with existing backup solutions. Using PVC alone should solve all backup issues. Using datavolumes with populators solves most, for example Metro DR and [Gitops](https://www.redhat.com/en/topics/devops/what-is-gitops#:~:text=GitOps%20uses%20Git%20repositories%20as,set%20for%20the%20application%20framework.) - datavolume manifest will be applied, the datavolume will create the PVC that will bind immediately to the PV waiting for it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backup and disaster recovery solutions

* Use PVCs directly and get them populated without datavolumes mediation.
* Can use one population definition for multiple PVCs - create 1 CR defining population source and use it for any PVC.
* Better compatibility with existing backup solutions. Using PVC alone should solve all backup issues. Using datavolumes with populators solves most, for example Metro DR and [Gitops](https://www.redhat.com/en/topics/devops/what-is-gitops#:~:text=GitOps%20uses%20Git%20repositories%20as,set%20for%20the%20application%20framework.) - datavolume manifest will be applied, the datavolume will create the PVC that will bind immediately to the PV waiting for it.
* Better compatibility with existing backup solutions and [Kubevirt](https://github.com/kubevirt/kubevirt) VMs with PVCs using populators and VMs with datavolumetemplates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from the above point? I'm not clear what this is saying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that the above point is only for using pvcs and datavolume without VMs, the second is VM using those PVCs and using datavolumetemplates.. maybe I can just continue the previous point with the mention of the VMs

* Can use one population definition for multiple PVCs - create 1 CR defining population source and use it for any PVC.
* Better compatibility with existing backup solutions. Using PVC alone should solve all backup issues. Using datavolumes with populators solves most, for example Metro DR and [Gitops](https://www.redhat.com/en/topics/devops/what-is-gitops#:~:text=GitOps%20uses%20Git%20repositories%20as,set%20for%20the%20application%20framework.) - datavolume manifest will be applied, the datavolume will create the PVC that will bind immediately to the PV waiting for it.
* Better compatibility with existing backup solutions and [Kubevirt](https://github.com/kubevirt/kubevirt) VMs with PVCs using populators and VMs with datavolumetemplates.
* Integration with [Kubevirt](https://github.com/kubevirt/kubevirt) with WaitForFirstConsumer(WFFC) storage class is simpler not requiring [doppelganger pod](https://github.com/kubevirt/kubevirt/blob/main/docs/localstorage-disks.md#the-problem) for the start of the VM.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is simpler not requiring -> is simpler and does not require a

requests:
storage: 10Gi
```
After creating the VolumeUploadSource and PVC, you can start the upload to the pvc as describe in the [upload doc](upload.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive my ignorance but does the user target PVC prime when uploading or the PVC they created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the user can upload to the target pvc as always, we take care of the rest under the covers.

### Using populators with DataVolumes

The integration of datavolumes and CDI populators is seamless. You can create the datavolumes the same way you always have.
If the used storage class is CSI storage then the datavolume population will occur via the CDI populators with the end result as always has been of a populated PVC. You will be able to notice that the created PVC will stay pending until the population process completes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the DataVolume targets a storage class that uses a CSI provisioner CDI will automatically use the new populators method. The behavior will be the same as always but with the following key differences. <different Pending DV status, and PVC will not become bound until is is populated>.

Signed-off-by: Shelly Kagan <skagan@redhat.com>
@alromeros
Copy link
Collaborator

Looks good to me! That said I think we should wait for another lgtm, preferably from some US folk
/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 10, 2023
@mhenriks
Copy link
Member

/approve

looks good now we need something for https://github.com/kubevirt/user-guide

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mhenriks

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 10, 2023
@mhenriks
Copy link
Member

/retest-required

@ShellyKa13
Copy link
Contributor Author

/retest

@kubevirt-bot kubevirt-bot merged commit e23ab12 into kubevirt:main Jul 11, 2023
16 checks passed
@ShellyKa13
Copy link
Contributor Author

/cherrypick release-v1.57

@kubevirt-bot
Copy link
Contributor

@ShellyKa13: new pull request created: #2812

In response to this:

/cherrypick release-v1.57

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants