Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VolumeGroupReplication support #1472

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ELENAGER
Copy link
Member

This version for VolumeGroupReplication support introduces code for separation of PVCs, created as part of consistency group. On every VRG reconcile we are iterating over all PVCs, protected by volume replication, and marking those, who are part of consistency group. Afterwards, during VR reconciliation, we will create VR for every PVC, that is not part of the consistency group, and VGR for all PVCs, that are using the same storage id.

@ELENAGER ELENAGER force-pushed the elena_tests branch 5 times, most recently from edae774 to ee1b472 Compare July 1, 2024 09:10
Copy link
Member

@ShyamsundarR ShyamsundarR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also created this for some common scheme to process CGs across volsync and volrep, that relates to some comments here: #1483

controllers/volumereplicationgroup_controller.go Outdated Show resolved Hide resolved
controllers/volumereplicationgroup_controller.go Outdated Show resolved Hide resolved
controllers/volumereplicationgroup_controller.go Outdated Show resolved Hide resolved
@@ -62,7 +64,7 @@ func (v *VRGInstance) reconcileVolRepsAsPrimary() {
}

// If VR did not reach primary state, it is fine to still upload the PV and continue processing
requeueResult, _, err := v.processVRAsPrimary(pvcNamespacedName, log)
requeueResult, _, err := v.processVRAsPrimary(pvcNamespacedName, pvc, log)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point I am thinking we should process a group here, and if the group is processed then check the PVCs reported by the group as protected and check it against the PVCs that we decide should belong to this group.

Assume we have a list of groups, with PVCs that belong to the group. We need to process the group only once as Primary, and future PVCs can be validated as part of the group status (i.e already protected by the group) or not.

@ELENAGER ELENAGER force-pushed the elena_tests branch 5 times, most recently from ff8fcf1 to 45dd82a Compare July 10, 2024 16:23
@ELENAGER ELENAGER force-pushed the elena_tests branch 10 times, most recently from 2f4eee7 to 2854dc7 Compare July 22, 2024 14:36
Copy link
Member

@ShyamsundarR ShyamsundarR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to review vrg_volrep.go file. Other changes are reviewed, nothing major, mostly integration points that may need some additional handling.

internal/controller/volumereplicationgroup_controller.go Outdated Show resolved Hide resolved
@@ -548,6 +556,10 @@ func (v *VRGInstance) processVRG() ctrl.Result {
return v.invalid(err, "Failed to process list of PVCs to protect", true)
}

if err := v.updatePVCListForCG(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(future) At some point for clarity, we potentially should remove the optimizations for PVC grouping into either lists and let the separate handle it. As currently we look at various conditions and optimize out calling the separate function. This is to help code redability and not functionality as such.

return nil
}

for idx := range v.volRepPVCs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This becomes common for both Volsync and Volrep, i.e PVC labeling. @BenamarMk @youhangwang (FYI)

Copy link
Member

@BenamarMk BenamarMk Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Currently, VolSync is not using this function yet (addConsistencyGroupLabel). The temporary solution in the VolSync-related CG code involves the user labeling the PVCs. We liked the idea of having users label the PVCs as it allows for multiple CGs per storageId. This is still missing from @youhangwang's PR.
I think we should implement the following for both VolSync and VolRep:

  1. Provide a common label CG name, such as ramendr.openshift.io/cg-label for a lack of a better name.
  2. If the user labels the PVC with that label, then the PVC label value for CG should be the combination of ramendr.openshift.io/consistency-group: storageId + cg-label.

Copy link
Member

@youhangwang youhangwang Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'm waiting for this pr merged, then I can rebase my pr to adopt this func. or maybe I can copy this func into my pr to handle volcyncPVCs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Provide a common label CG name, such as `ramendr.openshift.io/cg-label` for a lack of a better name.

2. If the user labels the PVC with that label, then the PVC label value for CG should be the combination of `ramendr.openshift.io/consistency-group: storageId + cg-label`.

Ack! if it is required that PVCs be separated into CGs of their own, even as they are selected by a common pvcSelector provided to Ramen, this label can help create sub-cg groups as desired.

Would it be safe to assume that this would be in the future, IOW not part of this PR or the initial CG PRs. The initial set would just group all PVCs to a cg, when possible, giving no further granularity to the app owner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@@ -668,10 +680,18 @@ func (v *VRGInstance) updatePVCList() error {
return nil
}

if err := v.updateReplicationClassList(); err != nil {
v.log.Error(err, "Failed to get VolumeReplicationClass list")
if rmnutil.IsCGEnabled(v.instance.GetAnnotations()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenamarMk if the CG annotation is added to an already created and reconciled DRPC/VRG, will it cause issues here. I.e now the VRG reconciliation would shift to CG based processing and potentially leave stale non-CG resources (at least in the Volrep case).

IOW, CG enabling is during initial reconcile from DRPC and should not be enabled later, is that somehow prevented overall from DRPC creating the VRG?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShyamsundarR Yes, that would violate the CG requirement. Existing workloads are not permitted to be converted to use CG retroactively. If a user inadvertently adds the CG annotation to an already created and reconciled DRPC/VRG, it could lead to inconsistent states, given that PVCs has to be labeled to use CG as well.

CG enabling should occur only during the initial reconciliation of DRPC and should not be enabled afterward. This is not currently prevented by the DRPC when creating the VRG, so if a user adds the CG annotation later, the user would eventually encounter CG errors.

@@ -763,6 +858,17 @@ func (v *VRGInstance) separatePVCsUsingStorageClassProvisioner(pvcList *corev1.P
}
}

if !replicationClassMatchFound {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CG is enabled, and SC has the replicationID label, then we should error out if we do not find a matching VGRClass, and not default the PVC to Volsync.

This either gets added here or with the PR #1487

@@ -6,6 +6,8 @@ apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
labels:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar change would be needed to the CephFS storageClass from file test/addons/rook-cephfs/kustomization.yaml ?

Copy link
Member

@ShyamsundarR ShyamsundarR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateVRStatus should ensure the desired PVC is part of the VGR group status, so that we know it is protected. Currently if a new PVC is added to the group we will report that as protected based on the existing VGR output.

@@ -67,7 +68,7 @@ func (v *VRGInstance) reconcileVolRepsAsPrimary() {
}

// If VR did not reach primary state, it is fine to still upload the PV and continue processing
requeueResult, _, err := v.processVRAsPrimary(pvcNamespacedName, log)
requeueResult, _, err := v.processVRAsPrimary(pvcNamespacedName, pvc, log)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A group request will protect all PVCs in the group, and so when we invoke protection for a single PVC here, without ensuring we are done with prior steps like pvcUnprotectVolRepIfDeleted, preparePVCForVRProtection we may introduce races that may get in the way of protection.

While the code to create/update VR or VGR is common, I think we may need to do certain operations to the group here before we proceed with a VGR for the group.

I am in parallel thinking how best to achieve this with the existing code as well.

@@ -67,7 +68,7 @@ func (v *VRGInstance) reconcileVolRepsAsPrimary() {
}

// If VR did not reach primary state, it is fine to still upload the PV and continue processing
requeueResult, _, err := v.processVRAsPrimary(pvcNamespacedName, log)
requeueResult, _, err := v.processVRAsPrimary(pvcNamespacedName, pvc, log)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pvcUnprotectVolRepIfDeleted should remove the CG label from the PVC, such that this PVC is no longer protected as part of the group.

Currently if the function above is traced (and required flags enabled) it would delete the entire VGR and not remove the PVC from the VGR.

While the feature of PVC deletion is still not the default (as there are other DRPC changes that are required), we should be aware of this problem if this is enabled.

return "", fmt.Errorf("missing storageID for PVC %s/%s", pvc.GetNamespace(), pvc.GetName())
}

vgrName := storageID + "-vgr"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be multiple workloads independently DR protected in the same namespace. These can have PVCs from the same SC with the same StorageID. The name will then conflict.

I would suggest, munging this with the VRG name as well. We cannot have 2 VRGs with the same name in the same namespace, so using that as part of the name maybe an option. It does create a possibility that we will have a longer name, so any shortening here may assist in that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed VGR name to {vrg}-{storageID}, @ShyamsundarR what is we will remove -drpc from vrg name? It will shorten the whole name a little

@ELENAGER ELENAGER force-pushed the elena_tests branch 3 times, most recently from fc759d3 to bebc6ae Compare July 25, 2024 10:21
@ShyamsundarR
Copy link
Member

jFYI added a tracker issue for any pending items that would not be addressed in this PR w.r.t CG support in Ramen, #1501

@ShyamsundarR
Copy link
Member

@ELENAGER I suggest we break this up into 3 PRs (at present). First with the changes to internal/controller/volumereplicationgroup_controller.go that includes labelling and can be consumed by the volsync code as well. The next with the GetName/Namespace/Labels/Generation etc. changes to internal/controller/vrg_volrep.go. The last where internal/controller/vrg_volrep.go treats PVCs as needing a VR or a VGR as the case maybe.

That way we can get the first one in and be reused by volsync, as we work through the rest of the changes.

Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>
@ELENAGER ELENAGER force-pushed the elena_tests branch 2 times, most recently from c7afc7e to 2796ba1 Compare August 20, 2024 12:29
Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants