Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use schedule instead of VolumeReplicationClass #121

Conversation

raghavendrabhat
Copy link
Collaborator

VolumeReplicationClass field in VolumeReplicationGroup is
nothing but a string. Instead of that, use a schedule which
is then used in combination with the storage driver for the
pvc (obtained from storageclass) by the VolumeReplicationGroup
to determine the appropriate VolumeReplicationClass to use.

TODO:

HUB/DRPC (Previously AVR):

  • modify DRPolicy to include schedule along with DR prepared
    clusters
  • Make DRPC (previously AVR) refer to the DRPolicy schedule
    while creating VolumeReplicationGroup and use that as the
    schedule for VolumeReplicationGroup.

Managed Cluster:

  • While reconciling VolumeReplicationGroup, for each of the
    PVC obtained via label selection, do this
    • storagedriver = StorageDriver(StorageClass(pvc))
    • Get All the VolumeReplicationClasses in the cluster
    • For each VolumeReplicationClass (chosen via label selector)
      if (replicationdriver == storagedriver and
      replicationSchedule == vrg.Spec.Schedule)
      chosenReplicationClass = VolumeReplicationClass

Signed-off-by: Raghavendra M raghavendra@redhat.com

VolumeReplicationClass field in VolumeReplicationGroup is
nothing but a string. Instead of that, use a schedule which
is then used in combination with the storage driver for the
pvc (obtained from storageclass) by the VolumeReplicationGroup
to determine the appropriate VolumeReplicationClass to use.

TODO:

HUB/DRPC (Previously AVR):
- modify DRPolicy to include schedule along with DR prepared
  clusters
- Make DRPC (previously AVR) refer to the DRPolicy schedule
  while creating VolumeReplicationGroup and use that as the
  schedule for VolumeReplicationGroup.

Managed Cluster:
- While reconciling VolumeReplicationGroup, for each of the
  PVC obtained via label selection, do this
    - storagedriver = StorageDriver(StorageClass(pvc))
    - Get All the VolumeReplicationClasses in the cluster
    - For each VolumeReplicationClass (chosen via label selector)
        if (replicationdriver == storagedriver and
             replicationSchedule == vrg.Spec.Schedule)
             chosenReplicationClass = VolumeReplicationClass

Signed-off-by: Raghavendra M <raghavendra@redhat.com>
@@ -233,21 +234,36 @@ func (r *DRPlacementControlReconciler) Reconcile(ctx context.Context, req ctrl.R
return ctrl.Result{}, err
}

schedule, err := r.getSchedule(ctx, drpc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRPolicy object will be saved in the DRPCInstance. I think you should use it (not yet merged), that way, you will not have to get the DRPolicy multiple times, one for the Schedule and another time for the ReplicationClassSelector

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack! Also as it stands this is being fetched too soon. BUT if DRPolicy is fetched and stored in drpc (which is almost there in #120 ) then these functions can be removed.


*replicationClassSelector = clusterPeers.Spec.ReplicationClassSelector

return replicationClassSelector, nil
Copy link
Member

@BenamarMk BenamarMk Jul 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you just return the copy of clusterPeers.Spec.ReplicationClassSelector instead of creating an instance in line 473 and then assignment in line 481?

@@ -438,6 +520,8 @@ type DRPCInstance struct {
userPlacementRule *plrv1.PlacementRule
drpcPlacementRule *plrv1.PlacementRule
mwu rmnutil.MWUtil
schedule *string
replclassselect *metav1.LabelSelector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't this variable following the camel case convention?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Future) Just a note here, the DRPCInstance is becoming huge. We should refactor it.

@@ -1109,6 +1147,11 @@ func (v *VRGInstance) updateVR(volRep *volrep.VolumeReplication,

// createVR creates a VolumeReplication CR with a PVC as its data source.
func (v *VRGInstance) createVR(vrNamespacedName types.NamespacedName, state volrep.ReplicationState) error {
volumeReplicationClass, err := v.selectVolumeReplicationClass(vrNamespacedName)
if err != nil {
return fmt.Errorf("failed to find the appropriate VolumeReplicationClass (%s)", v.instance.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the error returned here is lost? Shouldn't be included in the message?

if err := v.reconciler.Get(v.ctx, types.NamespacedName{Name: *scName}, storageClass); err != nil {
v.log.Info(fmt.Sprintf("Failed to get the storageclass %s", *scName))

return className, fmt.Errorf("failed to get the storageclass with name %s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, the err information is lost

continue
}

schedule, found := replicationClass.Spec.Parameters["schedule"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"schedule" is used in another placed. Please define as global const.

schedule, found := replicationClass.Spec.Parameters["schedule"]
if !found {
// As of now, the provisioner matches. Use this as the best guess
// until a replicationClass with schedule match is found. i.e.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you never find a schedule but a provisioner matched, is the className assignment here still valid?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack! If we choose this we will exit with a non-empty classname that does not match the schedule, let's match both driver name and schedule to select the volReplClass.

@@ -414,7 +496,7 @@ func (r *DRPlacementControlReconciler) addClusterPeersToPlacementRule(ctx contex
}

if len(clusterPeers.Spec.ClusterNames) == 0 {
return fmt.Errorf("invalid DRPolicy configuration. Name %s", clusterPeers.Name)
return fmt.Errorf("invalid DRPolicy configuration. No clusters given. Name %s", clusterPeers.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) errors as a single line with no caps please, these are included into other logs/error messages and hence the standard

@@ -233,21 +234,36 @@ func (r *DRPlacementControlReconciler) Reconcile(ctx context.Context, req ctrl.R
return ctrl.Result{}, err
}

schedule, err := r.getSchedule(ctx, drpc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack! Also as it stands this is being fetched too soon. BUT if DRPolicy is fetched and stored in drpc (which is almost there in #120 ) then these functions can be removed.

func (v *VRGInstance) updateReplicationClassList() error {
labelSelector := v.instance.Spec.ReplicationClassSelector

v.log.Info("Fetching PersistentVolumeClaims", "labeled", labels.Set(labelSelector.MatchLabels))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) message should read "Fetching VolumeReplicationClass" rather than PVC.

}

pvc := &corev1.PersistentVolumeClaim{}
if err := v.reconciler.Get(v.ctx, namespacedName, pvc); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already listed the PVCs and should not be "getting" them again. Reduce APi calls, and reuse fetched PVCs instead here (or in a higher function and pass the required replClass to lower functions, from an abstraction POV)

schedule, found := replicationClass.Spec.Parameters["schedule"]
if !found {
// As of now, the provisioner matches. Use this as the best guess
// until a replicationClass with schedule match is found. i.e.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack! If we choose this we will exit with a non-empty classname that does not match the schedule, let's match both driver name and schedule to select the volReplClass.

@@ -385,6 +403,26 @@ func (v *VRGInstance) updatePVCList() error {
return nil
}

func (v *VRGInstance) updateReplicationClassList() error {
labelSelector := v.instance.Spec.ReplicationClassSelector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(test) does empty label selector return all classes here?

@@ -322,6 +327,19 @@ func (v *VRGInstance) processVRG() (ctrl.Result, error) {
return ctrl.Result{Requeue: true}, nil
}

if err := v.updateReplicationClassList(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optimize) can this be postponed till we need it the first time? IOW, make the call when we need it, so that we are done with other checks like ready for primary/secondary etc.?

@@ -228,6 +229,8 @@ func filterPVC(mgr manager.Manager, pvc *corev1.PersistentVolumeClaim, log logr.
// +kubebuilder:rbac:groups=ramendr.openshift.io,resources=volumereplicationgroups/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=ramendr.openshift.io,resources=volumereplicationgroups/finalizers,verbs=update
// +kubebuilder:rbac:groups=replication.storage.openshift.io,resources=volumereplications,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=replication.storage.openshift.io,resources=volumereplicationclasses,verbs=get;list;watch;update;patch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(heads up) The file ramen/config/dr_cluster/rbac/role.yaml also has to be updated when roles change, as it is a partial copy of ramen/config/rbac/role.yaml to suit VRG reconciler needs (or ramen-dr-cluster needs)

// can be used to schedule replication to occur at regular, time-based
// intervals.
//+kubebuilder:validation:Pattern=`^(\d+|\*)(/\d+)?(\s+(\d+|\*)(/\d+)?){4}$`
Schedule *string `json:"schedule"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ramen/config/samples/ramendr_v1alpha1_drpolicy.yaml needs to be updated.

// that are scanned to select an appropriate VolumeReplicationClass
// for the VolumeReplication resource.
//+optional
ReplicationClassSelector metav1.LabelSelector `json:"replicationClassSelector,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ramen/config/samples/ramendr_v1alpha1_volumereplicationgroup.yaml needs to be updated

@@ -185,8 +187,8 @@ func updateManagedClusterViewWithVRG(mcv *fndv2.ManagedClusterView, replicationS
TypeMeta: metav1.TypeMeta{Kind: "VolumeReplicationGroup", APIVersion: "ramendr.openshift.io/v1alpha1"},
ObjectMeta: metav1.ObjectMeta{Name: DRPCName, Namespace: DRPCNamespaceName},
Spec: rmn.VolumeReplicationGroupSpec{
VolumeReplicationClass: "volume-rep-class",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ocm-minikube-ramen.sh creates this object, but it seems it is no longer required, so pull request #166 removes the creation and deletion of it.

@raghavendrabhat
Copy link
Collaborator Author

#182 has been submitted which has the changes rebased on top of #120 and also review comments addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants