Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter with default VRC annotation if multiple VRCs detected #1476

Merged
merged 1 commit into from
Jul 17, 2024

Conversation

Rakshith-R
Copy link
Contributor

@Rakshith-R Rakshith-R commented Jun 27, 2024

This commit modifies selectVolumeReplicationClass() to
filter VRCs list with default VRC annotation
replication.storage.openshift.io/volume-replication-class: true
if multiple VRCs are matched.
This makes sure we select the default VRC when multiple VRCs of
same schedule and provisioner but a special default annotated VRC
is present in the cluster.

@Rakshith-R Rakshith-R force-pushed the filterDefaultVRC branch 4 times, most recently from b8201ef to ce21847 Compare June 28, 2024 10:51
Copy link
Member

@ShyamsundarR ShyamsundarR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some initial comments, even though it is marked as a draft.

matchingReplicationClassList = append(matchingReplicationClassList, replicationClass)
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can deal with 0 or 1 found VRClasses here and call filterDefaultVRC if not.

provisioner, v.instance.Spec.Async.SchedulingInterval))

return nil, fmt.Errorf("no VolumeReplicationClass found to match provisioner and schedule")
case 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len could also be > 1 if there are multiple classes marked as the default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have multiple vrcs with default annotation, we should fail in line 1324.

controllers/vrg_volrep.go Outdated Show resolved Hide resolved
test/addons/rbd-mirror/start Outdated Show resolved Hide resolved
apiVersion: replication.storage.openshift.io/v1alpha1
kind: VolumeReplicationClass
metadata:
name: vrc-sample
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name is duplicate of existing VRClass and hence will overwrite the default (based on order of apply called).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be addressed, else when used it will cause unwanted troubleshooting cycles to figure out the names are in conflict.

Copy link
Member

@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Rakshith-R!

@@ -6,6 +6,8 @@ apiVersion: replication.storage.openshift.io/v1alpha1
kind: VolumeReplicationClass
metadata:
name: vrc-sample
annotation:
replication.storage.openshift.io/is-default-class: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true is yaml true, but annotations are map[string]string. Even if this works, better use "true" to make it more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rakshith-R can we use the string "true" here as stated?

@@ -268,4 +277,7 @@ wait_until_pool_mirroring_is_healthy(cluster2)
deploy_vrc_sample(cluster1)
deploy_vrc_sample(cluster2)

deploy_vrc_flatten_sample(cluster1)
deploy_vrc_flatten_sample(cluster2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to group all the resources we deploy in a kustomzation and apply all of them in one command. Then we don't need to helpers to apply the yamsl.

test/addons/rbd-mirror/start Outdated Show resolved Hide resolved
provisioner, v.instance.Spec.Async.SchedulingInterval))

return nil, fmt.Errorf("no VolumeReplicationClass found to match provisioner and schedule")
case 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have multiple vrcs with default annotation, we should fail in line 1324.

@@ -28,6 +28,11 @@ import (
rmnutil "github.com/ramendr/ramen/controllers/util"
)

const (
// defaultVRCAnnotationKey is the default annotation key for VolumeReplicationClass
defaultVRCAnnotationKey = "replication.storage.opensds.io/volume-replication-class"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on changes to vrc-sample.yaml, this should be:

"replication.storage.openshift.io/is-default-class"

Should we use a label instead of annotation? This make is possible to list the default vrcs - we don't need this now, but we may want to do this in the future.


return nil, fmt.Errorf("failed to get the storageclass of pvc %s (%w)",
namespacedName, err)
return nil, fmt.Errorf("failed to get the storageclass of pvc %s (%w)", namespacedName, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like unrelated formatting changes, it would be better not to modify anything which is not needed by the actual change.

}

// filterDefaultVRC filters the VRC list to return VRCs with default annotation
// if the list contains moret then one VRC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: moret

@@ -1271,17 +1276,52 @@ func (v *VRGInstance) selectVolumeReplicationClass(

// ReplicationClass that matches both VRG schedule and pvc provisioner
if schedulingInterval == v.instance.Spec.Async.SchedulingInterval {
v.log.Info(fmt.Sprintf("Found VolumeReplicationClass that matches provisioner and schedule %s/%s",
storageClass.Provisioner, v.instance.Spec.Async.SchedulingInterval))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the log makes sense, but we can do this in another commit. Lets focus on adding filtering multiple vrcs using the default annotation in this change.

}
} else {
defaultReplicationClassList = replicationClassList
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the name, I expect filterDefaultVRC to accept list with more than one vrc, and return a list with only the matching vrcs based the default annotation. The caller can do the handling (0, 1, >1). This will keep this function simpler, and we will cal it only when we find multiple matching vrcs.

The flow can be:

 matchingVRCs :=  v.selectVolumeReplicationClass()

 if len(matchingVRCs) > 1 {
     matchingVRCs = filterDefaultVRC(matchingVRCs)
 }

Now we can succeed if we have have list of one vrc, or fail in all other cases.

}
} else {
defaultReplicationClassList = replicationClassList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the default, so maybe rename the list to filteredVRCs?

@Rakshith-R Rakshith-R force-pushed the filterDefaultVRC branch 5 times, most recently from d2ea8ea to 1a888da Compare July 3, 2024 10:05
@Rakshith-R
Copy link
Contributor Author

(related) We can add the check for v.vrcUpdated inside the function updateReplicationClassList and fix the invocation from this function as well updatePVCList to reduce cognitive complexity of this function.

This just helped me a lot just slipping past the function line count limit.

The pr is now ready for review.
Added only env tests for now.

@Rakshith-R Rakshith-R marked this pull request as ready for review July 3, 2024 10:21
Copy link
Member

@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked at the filtering part, but see also comments from previous version that are not addressed by the current version.

}
}

v.log.Info(fmt.Sprintf("No VolumeReplicationClass found to match provisioner and schedule %s/%s",
filteredVRCs := v.filterDefaultVRC(matchingReplicationClassList)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have more than one vrc and none have the default annotation, this call will return empty list and we will fail with the "No vrc found" instead of "multiple vrcs found".

It will help if filterVRCs() will be smarter about selecting the default annotation, and return the right error when having no default annotation or multiple vrcs with default annotation.

Also it would be easier to follow if we first handle the zero case and then handle the > 1 case, and finish with the expected case. It is easier to handle errors correctly when handling errors first.

Something like:

if len == 0:
    fail

if len > 1:
    select the default or fail

return found vrc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's by design.
Filtering with annotation will be triggered when there's more one matching VRC.
We are logging a info msg about filtering in the call. It's implied that more than one VRC was found but none had default annotation. Ramen anyway falls back to volsync when it doesn't find a VRC so It does not exactly bubble out the error persistently to anyone..

Hence, we can again encounter case of 0 VRCs being found.

That's why the 0 check is at the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here when used does not default to volsync and updates a status condition with the error. Hence being specific here helps the user understand the issue than looking at logs.

For example the createVR path would generate an event if an error is returned from here.

) []*volrep.VolumeReplicationClass {
if len(replicationClassList) <= 1 {
return replicationClassList
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be clean to call this only when there are more than 1, so this check is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be clean to call this only when there are more than 1, so this check is not needed.

I think this logic can indeed be part of the filterVRC() function itself.
The selectVRC() cannot get any bigger.

}
}

return filteredVRCs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can report the right error:

  • multiple vrcs, none marked as default
  • multiple vrcs, more than one marked as default

This will help to simplify the caller, and very simple and clear here - the only function dealing with multiple vrcs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answered in another comment above.

#1476 (comment)

@Rakshith-R
Copy link
Contributor Author

I only looked at the filtering part, but see also comments from previous version that are not addressed by the current version.

#1476 (comment)
#1476 (comment)

I think these should answer most of the comments which were given towards the prev(draft) version.
Changes in [test/addons/rbd-mirror/start file are no longer part of this pr.

@Rakshith-R Rakshith-R requested a review from nirs July 3, 2024 11:40
@Rakshith-R
Copy link
Contributor Author

/cc @ShyamsundarR

Copy link
Member

@ShyamsundarR ShyamsundarR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been tested in a drenv setup, with an additional VRClass with the default annotation set?

Suggested patch to address the need for filterDefaultVRC to only deal with filtering when there are multiple VRCs detected:

diff --git a/internal/controller/vrg_volrep.go b/internal/controller/vrg_volrep.go
index 9df9b737..0528bb07 100644
--- a/internal/controller/vrg_volrep.go
+++ b/internal/controller/vrg_volrep.go
@@ -1275,36 +1275,27 @@ func (v *VRGInstance) selectVolumeReplicationClass(
                }
        }
 
-       filteredVRCs := v.filterDefaultVRC(matchingReplicationClassList)
-
-       switch len(filteredVRCs) {
+       switch len(matchingReplicationClassList) {
        case 0:
-               v.log.Info(fmt.Sprintf("No VolumeReplicationClass found to match provisioner and schedule %s/%s",
+               v.log.Info(fmt.Sprintf("No VolumeReplicationClass found that match provisioner and schedule %s/%s",
                        storageClass.Provisioner, v.instance.Spec.Async.SchedulingInterval))
 
-               return nil, fmt.Errorf("no VolumeReplicationClass found to match provisioner and schedule")
+               return nil, fmt.Errorf("no VolumeReplicationClass found that match provisioner and schedule")
        case 1:
                v.log.Info(fmt.Sprintf("Found VolumeReplicationClass that matches provisioner and schedule %s/%s",
                        storageClass.Provisioner, v.instance.Spec.Async.SchedulingInterval))
 
-               return filteredVRCs[0], nil
+               return matchingReplicationClassList[0], nil
        }
 
-       v.log.Info(fmt.Sprintf("Found multiple VolumeReplicationClass that matches provisioner and schedule %s/%s",
-               storageClass.Provisioner, v.instance.Spec.Async.SchedulingInterval))
-
-       return nil, fmt.Errorf("multiple VolumeReplicationClass found to match provisioner and schedule")
+       return v.filterDefaultVRC(matchingReplicationClassList)
 }
 
-// filterDefaultVRC filters the VRC list to return VRCs with default annotation
+// filterDefaultVRC filters the VRC list to return VRC with default annotation
 // if the list contains more than one VRC.
 func (v *VRGInstance) filterDefaultVRC(
        replicationClassList []*volrep.VolumeReplicationClass,
-) []*volrep.VolumeReplicationClass {
-       if len(replicationClassList) <= 1 {
-               return replicationClassList
-       }
-
+) (*volrep.VolumeReplicationClass, error) {
        v.log.Info("Found multiple matching VolumeReplicationClasses, filtering with default annotation")
 
        filteredVRCs := []*volrep.VolumeReplicationClass{}
@@ -1317,7 +1308,19 @@ func (v *VRGInstance) filterDefaultVRC(
                }
        }
 
-       return filteredVRCs
+       switch len(filteredVRCs) {
+       case 0:
+               v.log.Info(fmt.Sprintf("Multiple VolumeReplicationClass found, with no default annotation (%s/%s)",
+                       replicationClassList[0].Spec.Provisioner, v.instance.Spec.Async.SchedulingInterval))
+
+               return nil, fmt.Errorf("multiple VolumeReplicationClass found, with no default annotation, %s",
+                       defaultVRCAnnotationKey)
+       case 1:
+               return filteredVRCs[0], nil
+       }
+
+       return nil, fmt.Errorf("multiple VolumeReplicationClass found with default annotation, %s",
+               defaultVRCAnnotationKey)
 }
 
 // getStorageClass inspects the PVCs being protected by this VRG instance for the passed in namespacedName, and

@@ -28,6 +28,11 @@ import (
rmnutil "github.com/ramendr/ramen/internal/controller/util"
)

const (
// defaultVRCAnnotationKey is the default annotation key for VolumeReplicationClass
defaultVRCAnnotationKey = "replication.storage.openshift.io/volume-replication-class"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(future) It would be useful if this was imported fro csi-addons, such that it does not have to be redefined here.

apiVersion: replication.storage.openshift.io/v1alpha1
kind: VolumeReplicationClass
metadata:
name: vrc-sample
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be addressed, else when used it will cause unwanted troubleshooting cycles to figure out the names are in conflict.

@@ -6,6 +6,8 @@ apiVersion: replication.storage.openshift.io/v1alpha1
kind: VolumeReplicationClass
metadata:
name: vrc-sample
annotation:
replication.storage.openshift.io/is-default-class: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rakshith-R can we use the string "true" here as stated?

}
}

v.log.Info(fmt.Sprintf("No VolumeReplicationClass found to match provisioner and schedule %s/%s",
filteredVRCs := v.filterDefaultVRC(matchingReplicationClassList)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here when used does not default to volsync and updates a status condition with the error. Hence being specific here helps the user understand the issue than looking at logs.

For example the createVR path would generate an event if an error is returned from here.

This commit modifies selectVolumeReplicationClass() to
filter VRCs list with default VRC annotation
`replication.storage.opensds.io/volume-replication-class: true`
if multiple VRCs are matched.
This makes sure we select the default VRC when multiple VRCs of
same schedule and provisioner but a special default annotated VRC
is present in the cluster.

Signed-off-by: Rakshith R <rar@redhat.com>
@Rakshith-R
Copy link
Contributor Author

Rakshith-R commented Jul 11, 2024

I only looked at the filtering part, but see also comments from previous version that are not addressed by the current version.

#1476 (comment) #1476 (comment)

I think these should answer most of the comments which were given towards the prev(draft) version. Changes in [test/addons/rbd-mirror/start file are no longer part of this pr.

Sorry, i had forgotten to remove the files from test/addons/rbd-mirror/start

I tested it on ODF cluster with changes ramen image and multiple VRCs.
It picks up the annotated one as expected and fails in other negative cases.

2024-07-11T06:40:55.246Z	INFO	controllers.VolumeReplicationGroup.vrginstance	controller/vrg_volrep.go:1299	Found multiple matching VolumeReplicationClasses, filtering with default annotation	{"VolumeReplicationGroup": {"name":"app-busybox-rbd-3-push-placement-drpc","namespace":"app-busybox-rbd-3-push"}, "rid": "18739824-3b09-42d5-a915-797e183e0fdb", "State": "primary"}
2024-07-11T06:40:55.246Z	INFO	controllers.VolumeReplicationGroup.vrginstance	controller/vrg_volrep.go:1313	Multiple VolumeReplicationClass found, with no default annotation (openshift-storage.rbd.csi.ceph.com/5m)	{"VolumeReplicationGroup": {"name":"app-busybox-rbd-3-push-placement-drpc","namespace":"app-busybox-rbd-3-push"}, "rid": "18739824-3b09-42d5-a915-797e183e0fdb", "State": "primary"}
...

Copy link
Member

@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some detail we can improve, but looks good enough for now.

@ShyamsundarR ShyamsundarR merged commit 97f1457 into RamenDR:main Jul 17, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants