Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for cdi populators #2776

Merged
merged 2 commits into from
Jul 11, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 99 additions & 0 deletions doc/cdi-populators.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# CDI Volume Populators

## Recommended knowledge

What are [populators](https://kubernetes.io/blog/2022/05/16/volume-populators-beta/)

## Introduction
CDI volume populators are CDIs implementation of populating PVCs by importing/uploading/cloning data utilizing the new `dataSourceRef` field. New controllers and custom resources for each population method were introduced.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if this is incorrect but my spanish-talking brain tells me to change populating for a noun in implementation of populating PVCs.

Maybe CDI volume populators are CDI's implementation for the population of PVCs by... sounds good too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me it sounds fine :) I prefer leave it like this unless someone says its wrong

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or "CDI volume populators are CDI's implementation of the existing import, upload, and clone operations using the new dataSourceRef field of the PVC."


**Values of the new API?**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ?? I would rename this to something like Motivation or Benefits of using populators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) I started with why using the new API and changed it. I'll remove the ? and maybe change values to benefits

* Native synchronization with Kubernetes - this is kubernetes way of populating PVCs. Once PVC is bound we know it is populated (So far PVC was bound the moment the datavolume created it and the population progress was monitored via the datavolume)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the PVC was bound the moment the datavolume created it, and the population progress was monitored via the datavolume

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before introducing populators, the PVC...

* Use PVCs directly and get them populated without datavolumes mitigation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe mitigation -> involvement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually like the word I intended to use, which apparently is not mitigation but mediation

* Can use one population definition for multiple PVCs - create 1 CR defining population source and use it for any PVC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to mention that this is limited to a single namespace until https://kubernetes.io/blog/2023/01/02/cross-namespace-data-sources-alpha/ goes beta and is incorporated into CDI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhenriks didn't you solve that for now by using the tokens or I miss understood?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross namespace only works with datavolume integration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have to mention this limitation since currently alpha

* Better compatibility with existing backup solutions. Using PVC alone should solve all backup issues. Using datavolumes with populators solves most, for example Metro DR and [Gitops](https://www.redhat.com/en/topics/devops/what-is-gitops#:~:text=GitOps%20uses%20Git%20repositories%20as,set%20for%20the%20application%20framework.) - datavolume manifest will be applied, the datavolume will create the PVC that will bind immediately to the PV waiting for it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backup and disaster recovery solutions

* Better compatibility with integration with [Kubevirt](https://github.com/kubevirt/kubevirt) and existing backup solutions, using VMs with PVCs using populators and VMs with datavolumetemplates.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compatibility with -> and?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not exactly right but I will rephrase

* Integration with [Kubevirt](https://github.com/kubevirt/kubevirt) with WFFC storage class is simpler not requiring [doppelganger pod](https://github.com/kubevirt/kubevirt/blob/main/docs/localstorage-disks.md#the-problem) for the start of the VM.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe expand WFFC into WaitForFirstConsumer(WFFC) so it is clear what the acronym means



## Using the populators

We introduced new controllers for each population method. Each controller supports a matching CRD which provides the needed information for the population.
Example for an instance of VolumeImportSource CRD:

```yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: VolumeImportSource
metadata:
name: my-import-source
spec:
source:
http:
url: "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img"
```

### Using populators with PVCs
User can create a CR and PVCs specifying the CR in the `DataSourceRef` field and those will be handled by the matching populator controller.

#### Import
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be more specific with each example: We should add an example of a valid CRD and a valid PVC for each populator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have it for every example except upload which Ill add

Example for PVC which will be handled by the import populator:
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
dataSourceRef:
apiGroup: cdi.kubevirt.io
kind: VolumeImportSource
name: my-import-source
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
```

Above PVC will trigger the reconcile of the import populator controller.
The controller will create a matching temporary PVC with the appropriate annotations, which will get bound and populated.
Once the temporary PVC population is done, the PV will be rebound to the original PVC completing the population process.

#### Upload
For upload need to follow the same guidelines as describe in the [upload doc](upload.md) but instead of creating a data volume you can create VolumeUploadSource CR and PVC similar to the import examples in this doc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would give a full example of upload as well, since you are already doing it for import and clone.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree


#### Clone
Same for clone very similar API:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sound odd, maybe Example of a PVC that will be populated by clone-populator:

```yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: VolumeCloneSource
metadata:
name: my-clone-source
spec:
source:
kind: PersistentVolumeClaim
name: my-pvc-source
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-cloned-pvc
spec:
dataSourceRef:
apiGroup: cdi.kubevirt.io
kind: VolumeCloneSource
name: my-clone-source
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi

```

### Using populators with DataVolumes

The integration of datavolumes and CDI populators is seamless. You can create the datavolumes the same way you always have.
If the used storage class is CSI storage then the datavolume population will occur via the CDI populators with the end result as always has been of a populated PVC. You will be able to notice that the created PVC will stay pending until the population process completes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the DataVolume targets a storage class that uses a CSI provisioner CDI will automatically use the new populators method. The behavior will be the same as always but with the following key differences. <different Pending DV status, and PVC will not become bound until is is populated>.

For more information of using datavolumes for population check the [datavolume doc](datavolumes.md)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a strict list of requirements for populators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after all of the PRs I stated in the description there will be no requirements other then the CSI storage which I mentioned.


> NOTE: Datavolumes and the PVCs they create will be marked with "usePopulator" Annotation to indicate the population is done via CDI populators