Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LastSyncTime to volume replication status #232

Merged
merged 4 commits into from
Sep 13, 2022

Conversation

yati1998
Copy link
Contributor

@yati1998 yati1998 commented Sep 7, 2022

This PR adds lastsynctime to volumereplication status and updates the reconcile logic to get the LastSynctime

@yati1998 yati1998 changed the title Volrep last Add LastSyncTime to volume replication status Sep 7, 2022
@mergify mergify bot added the vendor Pull requests that update vendored dependencies label Sep 7, 2022
@@ -125,6 +125,8 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re
}
// remove the prefix keys in volume replication class parameters
parameters := filterPrefixedParameters(replicationParameterPrefix, vrcObj.Spec.Parameters)
schedulingTime := parameters["schedulingInterval"]
scheduleTime, _ := time.ParseDuration(schedulingTime)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

describe the schedulingInterval parameter somewhere

message GetVolumeReplicationInfoResponse{
// Holds the last sync time.
// This field is REQUIRED.
int64 last_sync_time = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not implemented,
with this change, we can remove some code for conversion in other functions too.

Comment on lines 128 to 129
schedulingTime := parameters["schedulingInterval"]
scheduleTime, _ := time.ParseDuration(schedulingTime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is optional and maybe empty ?

do we need a default schedule time ?

cc @Madhu-1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be empty and it's optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -363,13 +365,30 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re
}

instance.Status.LastCompletionTime = getCurrentTime()
var last_sync_time int64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use _ in variable names

@@ -363,13 +365,30 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re
}

instance.Status.LastCompletionTime = getCurrentTime()
var last_sync_time int64
var requeueForInfo bool
if instance.Spec.ReplicationState == replicationv1alpha1.Primary {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required only for primary state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to resync only when it is in primary state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary end has the status, that it reads from the secondary, and so we use that to update the lastSyncTime.

In cases where we demoted a Primary and are waiting for both Secondaries to be in sync, the timestamp is still useful. Tells us how far back the sync is, before either image can be gracefully promoted.

The demoted snapshot sync time is not critical at present, it would also just show that the lastSyncTime is in the past as compared to the demoted snapshot, and ideally once the demoted snapshot is synced the timestamps would be the same. So not sure if it is useful as such.

Needs more thought on how to use that information, hence sticking to when image is primary at one end is useful.

@@ -125,6 +125,8 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re
}
// remove the prefix keys in volume replication class parameters
parameters := filterPrefixedParameters(replicationParameterPrefix, vrcObj.Spec.Parameters)
schedulingTime := parameters["schedulingInterval"]
scheduleTime, _ := time.ParseDuration(schedulingTime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whhy we are sending parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to update the last sync time after every interval and that is, scheduling interval and hence, it's required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from VolumeReplication CR schedulingInterval is optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, thanks for notifying, will check on that and update the PR.

@@ -651,3 +698,9 @@ func getCurrentTime() *metav1.Time {

return &metav1NowTime
}

func getTime(time time.Time) *metav1.Time {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a function for this one.

@@ -125,6 +125,14 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re
}
// remove the prefix keys in volume replication class parameters
parameters := filterPrefixedParameters(replicationParameterPrefix, vrcObj.Spec.Parameters)
schedulingTime := parameters["schedulingInterval"]
if schedulingTime == "" {
schedulingTime = "1m"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define it as a constant ?
and write elaborate how it will be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not done.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the schedule is not set for this VR, then the pool level schedule (if available) should be used for the reconcile requeue. If that is not available then we can either NOT requeue or requeue for a duration till the last sync time is not empty (as post that there would be no further syncs).

There is a possibility that a pool level schedule is added later, once we make the above not to requeue decision, but that would either require a restart of the plugin (so that all VRs are reconciled), or reconciling based on added pool level schedules, or waiting for the periodic reconcile (24h) at which time this will be picked up.

We possibly want to track the above for the future, for now if there is no schedule we can default to a something (1m/5m), as the current DR stack comes with a schedule and should not break the functionality.

message GetVolumeReplicationInfoResponse{
// Holds the last sync time.
// This field is REQUIRED.
int64 last_sync_time = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not implemented,
with this change, we can remove some code for conversion in other functions too.

Comment on lines 657 to 665
return 0, resp.Error
}

infoResponse, ok := resp.Response.(*proto.GetVolumeReplicationInfoResponse)
if !ok {
err := fmt.Errorf("received response of unexpected type")
vr.logger.Error(err, "unable to parse response")

return 0, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return 0, resp.Error
}
infoResponse, ok := resp.Response.(*proto.GetVolumeReplicationInfoResponse)
if !ok {
err := fmt.Errorf("received response of unexpected type")
vr.logger.Error(err, "unable to parse response")
return 0, err
return nil, resp.Error
}
infoResponse, ok := resp.Response.(*proto.GetVolumeReplicationInfoResponse)
if !ok {
err := fmt.Errorf("received response of unexpected type")
vr.logger.Error(err, "unable to parse response")
return nil, err

metav1Time := metav1.NewTime(lastSyncTime1)
instance.Status.LastSyncTime = &metav1Time
requeueForInfo = true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary newline

Comment on lines 133 to 136
scheduleTime, err := time.ParseDuration(schedulingTime)
if err != nil {
logger.Error(err, "failed to parse time: %v", schedulingTime)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at these lines again.

Comment on lines 129 to 133
schedulingTime := parameters["schedulingInterval"]
if schedulingTime == "" {
schedulingTime = "1m"
}
scheduleTime, err := time.ParseDuration(schedulingTime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use rawScheduleTime instead schedulingTime,
it is confusing.

@@ -125,6 +125,14 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re
}
// remove the prefix keys in volume replication class parameters
parameters := filterPrefixedParameters(replicationParameterPrefix, vrcObj.Spec.Parameters)
schedulingTime := parameters["schedulingInterval"]
if schedulingTime == "" {
schedulingTime = "1m"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not done.


var requeueForInfo bool
if instance.Spec.ReplicationState == replicationv1alpha1.Primary {
lastSyncTime, err := r.getVolumeReplicationInfo(vr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call this info instead of lastSyncTime

@@ -614,6 +644,30 @@ func (r *VolumeReplicationReconciler) enableReplication(vr *volumeReplicationIns
return nil
}

// getVolumeReplicationInfo gets volume replication info.
func (r *VolumeReplicationReconciler) getVolumeReplicationInfo(vr *volumeReplicationInstance) (*timestamppb.Timestamp, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not return an info kind of object? return something like what volumeReplication.GetInfo() does, or rename the function to return a LastSyncTimestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah for now it just returns last sync time, but this name is given just with the thought of future implementation, we may like to return other informations along with the last sync time, hence don't want it to be specific.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state is confusing, so you will have to decide what this function does. It can be renamed once it gets extended, no need to prepare function names for things in the future. Leave a comment with the possibility if you think it is useful.

if err != nil {
logger.Error(err, "Failed to get volume replication info")
}
lastSyncTime1 := lastSyncTime.AsTime()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this can not fail, just place it in metav1Time := metav1.NewTime(lastSyncTime.AsTime()).

metav1Time sounds more like a package name, similar to metav1. This is just a variable, so why not call it time?

@@ -363,13 +372,34 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re
}

instance.Status.LastCompletionTime = getCurrentTime()

var requeueForInfo bool
if instance.Spec.ReplicationState == replicationv1alpha1.Primary {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 If an image is in the split-brain state, is there a TS that is usable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK no during split-brain no sync will happen this is not useful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In split-brain, I do not assume one of the images is marked Primary? If that is the case, the lastSyncTime should be the time of the last successful sync, maybe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are updating it in every interval, I guess if we reach split brain case, it would already have that, cc @ShyamsundarR

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question really is what does RBD report, and then what should the proto report. In the case of split-brain we should report not-synced time (which IMHO is the "0" timestamp), or if available the last time it was synced...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, on a sync failure (like split-brain), no LastSyncTime should be reported, but ideally an error. Either the automatic rescheduling should stop, or be done with an exponential backoff.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the image is in replaying state with RBD we get the additional description, if stopped or split-brain the additional data is not available, and we will hence default to the "0" time stamp. We should be fine with that.


var requeueForInfo bool
if instance.Spec.ReplicationState == replicationv1alpha1.Primary {
lastSyncTime, err := r.getVolumeReplicationInfo(vr)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up!

The last sync time can be "0" or empty, i.e it was never synced even once, we need to determine how that value would be reflected. One option is to set it to the lower bound of UTC epoch (I am a bit confused on what Timestamp holds, UTC epoch (since 1970), or since 0001-01-01T00:00:00Z)

@yati1998 yati1998 force-pushed the volrep-last branch 2 times, most recently from e9ad9e1 to bccc4ef Compare September 12, 2022 05:05
@yati1998 yati1998 marked this pull request as ready for review September 12, 2022 05:23
@mergify mergify bot requested a review from Yuggupta27 September 12, 2022 05:24
Comment on lines 369 to 381
rawScheduleTime := parameters["schedulingInterval"]
if rawScheduleTime == "" {
rawScheduleTime = "1m"
}
scheduleTime, err := time.ParseDuration(rawScheduleTime)
if err != nil {
logger.Error(err, "failed to parse time: %v", rawScheduleTime)
return ctrl.Result{}, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rawScheduleTime := parameters["schedulingInterval"]
if rawScheduleTime == "" {
rawScheduleTime = "1m"
}
scheduleTime, err := time.ParseDuration(rawScheduleTime)
if err != nil {
logger.Error(err, "failed to parse time: %v", rawScheduleTime)
return ctrl.Result{}, err
}
const defaultScheduleTime = time.Minute
func getScheduleTime(parameters map[string]string) (time.Duration) {
rawScheduleTime := parameters["schedulingInterval"]
if rawScheduleTime == "" {
return defaultScheduleTime
}
scheduleTime, err := time.ParseDuration(rawScheduleTime)
if err != nil {
logger.Error(err, "failed to parse time: %v", rawScheduleTime)
return defaultScheduleTime
}
return scheduleTime
}

Create a constant and a function like above and use it .

@yati1998 ^

@yati1998 yati1998 force-pushed the volrep-last branch 2 times, most recently from ca61302 to e4bc1d9 Compare September 12, 2022 08:39
Copy link

@ShyamsundarR ShyamsundarR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly LGTM! The logical intention seems to be covered and correct.

err = r.updateReplicationStatus(instance, logger, getReplicationState(instance), msg)
if err != nil {
return ctrl.Result{}, err
}

logger.Info(msg)

scheduleTime := getScheduleTime(parameters, logger)
if requeueForInfo {
logger.Info("volume is primary, requeuing to get volume replication info")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging is not much helpful. Remove it on add the next queue interval time. and also move line 388 inside the if condition

@@ -614,6 +651,30 @@ func (r *VolumeReplicationReconciler) enableReplication(vr *volumeReplicationIns
return nil
}

// getVolumeReplicationInfo gets volume replication info.
func (r *VolumeReplicationReconciler) getLastSyncTime(vr *volumeReplicationInstance) (*timestamppb.Timestamp, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should not be doing this. Currently, we have only one parameter in the GetInfo this looks fine, If we add more parameters we need to write a new function for it. My suggestion is to return a complete response infoResponse and use it in caller

@@ -45,6 +45,8 @@ type VolumeReplication interface {
// ResyncVolume RPC call to resync the volume.
ResyncVolume(volumeID, replicationID string, force bool, secretName, secretNamespace string, parameters map[string]string) (*proto.
ResyncVolumeResponse, error)
// GetVolumeReplicationInfo RPC call to get volume replication info.
GetVolumeReplicationInfo(volumeID, replicationID string, secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GetVolumeReplicationInfo(volumeID, replicationID string, secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error)
GetVolumeReplicationInfo(volumeID, replicationID , secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error)```

Comment on lines 150 to 151
func (rc *replicationClient) GetVolumeReplicationInfo(volumeID, replicationID string,
secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (rc *replicationClient) GetVolumeReplicationInfo(volumeID, replicationID string,
secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error) {
func (rc *replicationClient) GetVolumeReplicationInfo(volumeID, replicationID,
secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error) {

}

lastsynctime := resp.GetLastSyncTime()
if lastsynctime == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when this can be nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case there is no local snapshot timestamp and the csi driver return nil.

@@ -51,6 +52,7 @@ const (
pvcDataSource = "PersistentVolumeClaim"
volumeReplicationClass = "VolumeReplicationClass"
volumeReplication = "VolumeReplication"
defaultScheduleTime = time.Minute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every minute sounds very often as a default, how about hourly? Consider that there can be many volumes, and the LastSyncTime may not be critical information. In environments with hundred (to name a random number) volume, users should not run into high resource consumption because of the constant reconciling.

For environments that require more strict updates on syncing, the parameter can be used to change it.

@@ -363,13 +372,34 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re
}

instance.Status.LastCompletionTime = getCurrentTime()

var requeueForInfo bool
if instance.Spec.ReplicationState == replicationv1alpha1.Primary {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, on a sync failure (like split-brain), no LastSyncTime should be reported, but ideally an error. Either the automatic rescheduling should stop, or be done with an exponential backoff.

}
lastSyncTime := info.GetLastSyncTime()

lastsynctime := metav1.NewTime(lastSyncTime.AsTime())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very confusing naming, there is lastSyncTime and lastsynctime. If you really need both, make sure to call them differently (maybe ts or timestamp and lastSyncTime or just t).

This commit adds lastsynctime to volume
replicationstatus and updates the crds.

Signed-off-by: yati1998 <ypadia@redhat.com>
This commit creates internal grpc to get volume
replication information.

Signed-off-by: yati1998 <ypadia@redhat.com>
this commit updates the vendor to get latest
spec.

Signed-off-by: yati1998 <ypadia@redhat.com>
@yati1998 yati1998 force-pushed the volrep-last branch 2 times, most recently from 42af0cf to f0c7f15 Compare September 13, 2022 07:44
@yati1998
Copy link
Contributor Author

@nixpanic @ShyamsundarR @Rakshith-R @Madhu-1 done with all the changes here as well.

Comment on lines 386 to 388
if instance.Spec.ReplicationState == replicationv1alpha1.Secondary {
instance.Status.LastSyncTime = nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go at line 382,
otherwise there will be no update to status.

Copy link
Collaborator

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the documentation for the new parameter needs to be added. Otherwise it looks good to me.

This commit adds reconcile logic to update the
last sync time.

Signed-off-by: yati1998 <ypadia@redhat.com>
@Madhu-1
Copy link
Member

Madhu-1 commented Sep 13, 2022

If the RPC is not implemented, this will be a breaking change. we need to address it later

Copy link
Member

@Rakshith-R Rakshith-R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yati1998 ,can you open an issue to track handling NotImplemented Error ?
We can set lastSyncTime to nil if getvolrepInfo returns this error.
We'll need to fix this before another release of csi-addons operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
vendor Pull requests that update vendored dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants