Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: remove grpc and csi-addons import from rbd/replication.go #3884

Merged
merged 1 commit into from
Jun 22, 2023
Merged

rbd: remove grpc and csi-addons import from rbd/replication.go #3884

merged 1 commit into from
Jun 22, 2023

Conversation

riya-singhal31
Copy link
Contributor

this commit removes grpc import from replication.go and replaced it with usual errors

updates - #3314

@mergify mergify bot added the component/rbd Issues related to RBD label Jun 5, 2023
// This is a most likely a transient condition and may be corrected
// by retrying with a backoff. Note that it is not always safe to retry
// non-idempotent operations.
ErrUnavailable = errors.New("service is currently unavailable")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrUnavailable could be ErrUnavailableService for better readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

// ErrFailedPrecondition is returned when operation is rejected because the system is not in a state
// required for the operation's execution.
ErrFailedPrecondition = errors.New("operation is rejected because the system is not in a state required for the operation's execution.")
// ErrUnavailable is returned when the service is currently unavailable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check the indentation.

Copy link
Contributor

@Rakshith-R Rakshith-R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked issue does not require this change.
Can you provide the reason for making this change ?
grpc codes & status are used for a reason.

@riya-singhal31
Copy link
Contributor Author

The linked issue does not require this change. Can you provide the reason for making this change ? grpc codes & status are used for a reason.

This is the update for - https://github.com/ceph/ceph-csi/pull/3608/files#r1092953730
This commit involves cleanup of grpc package and will be adding another commit to remove csi-addons.

@Rakshith-R
Copy link
Contributor

The linked issue does not require this change. Can you provide the reason for making this change ? grpc codes & status are used for a reason.

This is the update for - https://github.com/ceph/ceph-csi/pull/3608/files#r1092953730 This commit involves cleanup of grpc package and will be adding another commit to remove csi-addons.

The main devel branch should get only complete prs going from one stable state to another.
Please mark the pr ready for review (draft state otherwise) until all commits are in.
Make sure the functions(in csiaddons package) which in turn handle these errors definitely need to return the correct corresponding grpc status and codes.

Copy link
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start for cleaning up, thanks!

You'll need to check the callers of the functions, and map the new errors to some gRPC codes. The CSI protocol expects a gRPC code+error to be returned from the upper API functions. Have a look at the CSI spec to see what code is suitable for the error in the particular API (CreateVolume or DeleteVolume may return a different error to its caller for the same internal failure).

@riya-singhal31 riya-singhal31 changed the title rbd: remove grpc import from rbd/replication.go rbd: remove grpc and csi-addons import from rbd/replication.go Jun 13, 2023
@riya-singhal31
Copy link
Contributor Author

The linked issue does not require this change. Can you provide the reason for making this change ? grpc codes & status are used for a reason.

This is the update for - https://github.com/ceph/ceph-csi/pull/3608/files#r1092953730 This commit involves cleanup of grpc package and will be adding another commit to remove csi-addons.

The main devel branch should get only complete prs going from one stable state to another. Please mark the pr ready for review (draft state otherwise) until all commits are in. Make sure the functions(in csiaddons package) which in turn handle these errors definitely need to return the correct corresponding grpc status and codes.

Yes, I have used the similar error definitions as the corresponding grpc status and codes.

return corerbd.DisableVolumeReplication(rbdVol, mirroringInfo, force)
err = rbdVol.DisableVolumeReplication(mirroringInfo, force)
if err != nil {
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not return err directly, but a gRPC error with a suitable code. DisableVolumeReplication() did not seem to have done that in all cases, that was not correct. With this refactoring it becomes clear, and you should address that.

)

func (rv *rbdVolume) ResyncVol(localStatus librbd.SiteMirrorImageStatus, force bool) error {
if resyncRequired(localStatus) {
// If the force option is not set return the error message to retry
// with Force option.
if !force {
return status.Errorf(codes.FailedPrecondition,
"image is in %q state, description (%s). Force resync to recover volume",
fmt.Errorf("image is in %q state, description (%s). Force resync to recover volume",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not look righ, you may want to assign it to err and return that?

internal/rbd/replication.go:32:14: unusedresult: result of fmt.Errorf call not used (govet)
			fmt.Errorf("image is in %q state, description (%s). Force resync to recover volume",
			          ^

@riya-singhal31
Copy link
Contributor Author

This now removes the replication service from internal/rbd and returns gRPC code+error in internal/csi-addons/rbd.
Thanks, PTAL.
@nixpanic @Rakshith-R

Comment on lines 48 to 57
// ErrFailedPrecondition is returned when operation is rejected because the system is not in a state
// required for the operation's execution.
ErrFailedPrecondition = errors.New("system is not in a state required for the operation's execution")
// ErrServiceUnavailable is returned when the service is currently unavailable.
// This is a most likely a transient condition and may be corrected
// by retrying with a backoff. Note that it is not always safe to retry
// non-idempotent operations.
ErrServiceUnavailable = errors.New("service is currently unavailable")
// ErrAborted is returned when the operation is aborted.
ErrAborted = errors.New("operation got aborted")
// ErrInvalidArgument is returned when the client specified an invalid argument.
ErrInvalidArgument = errors.New("invalid arguments provided")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are these error messages are used?

Comment on lines 333 to 332
switch {
case strings.Contains(err.Error(), "failed to get local state"):
return nil, status.Error(codes.Internal, err.Error())
case strings.Contains(err.Error(), "secondary image status is"):
return nil, status.Error(codes.InvalidArgument, err.Error())
case strings.Contains(err.Error(), "failed to disable image mirroring"):
return nil, status.Error(codes.Internal, err.Error())
case strings.Contains(err.Error(), "failed to get mirroring info of image"):
return nil, status.Error(codes.Internal, err.Error())
case strings.Contains(err.Error(), "in disabling state"):
return nil, status.Error(codes.Aborted, err.Error())
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should define an error and return it and use errors.IS/As for comparison instead of string comparison.

return status.Errorf(codes.FailedPrecondition,
"image is in %q state, description (%s). Force resync to recover volume",
localStatus.State, localStatus.Description)
fmt.Errorf("image is in %q state, description (%s). Force resync to recover volume", localStatus.State, localStatus.Description)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fmt.Errorf(....) statements return an error, and a lot of them are not used/returned.

You can probably do something like this which will be better:

return fmt.Errorf("%w: image is in %q state, description (%s). Force resync to recover volume", ErrFailedPrecondition, localStatus.State, localStatus.Description)

or

err = fmt.Errorf("%w: image is in %q state, description (%s). Force resync to recover volume", ErrFailedPrecondition, localStatus.State, localStatus.Description)
return err

By using %w and pasing ErrFailedPrecondition for the formatting, you get the error message in the error string, and the type of the error will still be ErrFailedPrecondition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Niels, I have already done this here (in 2nd commit),
return fmt.Errorf("image is in %q state, description (%s). Force resync to recover volume: %w",

}
err := rv.resyncImage()
if err != nil {
return status.Error(codes.Internal, err.Error())
fmt.Errorf("failed to resync image")
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type of error is returned here? Maybe it makes sense to provide a comment at the top of the function that describes the different errors? The caller of the function can then easily figure out what errors need to be caught and handled differently.

return &replication.DisableVolumeReplicationResponse{}, nil
}
switch {
case strings.Contains(err.Error(), "failed to get local state"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking for contents of the string, use errors.Is() as function to check if an error is of a particular type.

switch {
case errors.Is(err, ErrSnapNotFound):
// as the snapshot is not present, create new snapshot,clone and
// delete the temporary snapshot
err = createRBDClone(ctx, tempClone, rv, snap)
if err != nil {
return false, err
}
return true, nil
case errors.Is(err, ErrImageNotFound):
// as the temp clone does not exist,check snapshot exists on parent volume
// snapshot name is same as temporary clone image
snap.RbdImageName = tempClone.RbdImageName
err = parentVol.checkSnapExists(snap)
if err == nil {
// the temp clone exists, delete it lets reserve a new ID and
// create new resources for a cleaner approach
err = parentVol.deleteSnapshot(ctx, snap)
}
if errors.Is(err, ErrSnapNotFound) {
return false, nil
}
return false, err
default:
// any error other than the above return error
return false, err
}

@riya-singhal31
Copy link
Contributor Author

riya-singhal31 commented Jun 15, 2023

Thanks for the comments, have addressed all the suggested modification, requesting you to PTAL,
@Madhu-1 @nixpanic

Copy link
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can squash both commits into one, as the 2nd almost redoes everything from the 1st.

Applying only the 1st will likely not result in a functional ceph-csi executable. Only if the commits are in such an order and independent enough so that applying one at the time, run tests, apply the next, run tests, keeps working as it should, commits can be split. (That is the best way, but sometimes difficult to get right. It is required for git blame which can be used for bug hunting.)

return nil, status.Error(codes.Internal, err.Error())
case errors.Is(err, corerbd.ErrAborted):
return nil, status.Error(codes.Aborted, err.Error())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a default case, otherwise uncaught errors will get ignored

internal/csi-addons/rbd/replication.go Outdated Show resolved Hide resolved
@riya-singhal31
Copy link
Contributor Author

You can squash both commits into one, as the 2nd almost redoes everything from the 1st.

Applying only the 1st will likely not result in a functional ceph-csi executable. Only if the commits are in such an order and independent enough so that applying one at the time, run tests, apply the next, run tests, keeps working as it should, commits can be split. (That is the best way, but sometimes difficult to get right. It is required for git blame which can be used for bug hunting.)

Updated, thanks.

@riya-singhal31
Copy link
Contributor Author

Update: To reduce the cyclomatic complexity, changed the switch statements to storing the values in map.

@@ -649,6 +665,17 @@ func (rs *ReplicationServer) ResyncVolume(ctx context.Context,
return resp, nil
}

func getError(err error, errorStatusMap map[error]codes.Code) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New helper functions like this require a unit-test to prevent them from breaking in the future. Please add one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the unit test, thank you!

@@ -627,9 +639,13 @@ func (rs *ReplicationServer) ResyncVolume(ctx context.Context,

err = rbdVol.ResyncVol(localStatus, req.Force)
if err != nil {
log.ErrorLog(ctx, err.Error())
errorStatusMap := map[error]codes.Code{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a signle errorStatusMap internally to the getError() function. You use it twice, and there is no overlap in the keys.

nixpanic
nixpanic previously approved these changes Jun 19, 2023
@@ -326,7 +326,12 @@ func (rs *ReplicationServer) DisableVolumeReplication(ctx context.Context,
case librbd.MirrorImageDisabling:
return nil, status.Errorf(codes.Aborted, "%s is in disabling state", volumeID)
case librbd.MirrorImageEnabled:
return corerbd.DisableVolumeReplication(rbdVol, mirroringInfo, force)
err = rbdVol.DisableVolumeReplication(mirroringInfo, force)
if err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the usual check is for err != nil, and have the success path continue after the if-statement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Niels, updated. PTAL

internal/rbd/errors.go Outdated Show resolved Hide resolved
}
if mirroringInfo.State == librbd.MirrorImageDisabling {
return nil, status.Errorf(codes.Aborted, "%s is in disabling state", rbdVol.VolID)
return fmt.Errorf("%w: %s is in disabling state", ErrAborted, rv.VolID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("%w: %s is in disabling state", ErrAborted, rv.VolID)
return fmt.Errorf("%w: %q is in disabling state", ErrAborted, rv.VolID)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still pending.

internal/rbd/replication.go Outdated Show resolved Hide resolved
internal/rbd/replication.go Show resolved Hide resolved
internal/rbd/replication.go Outdated Show resolved Hide resolved
internal/csi-addons/rbd/replication_test.go Show resolved Hide resolved
internal/csi-addons/rbd/replication.go Outdated Show resolved Hide resolved
nixpanic
nixpanic previously approved these changes Jun 21, 2023
Copy link
Contributor

@Rakshith-R Rakshith-R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment on lines 673 to 675
if err == nil {
return status.Error(codes.OK, codes.OK.String())
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the first check, move this to line 656

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@mergify mergify bot dismissed nixpanic’s stale review June 22, 2023 08:14

Pull request has been modified.

}
}

// Handle any other error nol nil error not listed in the map
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Handle any other error nol nil error not listed in the map
// Handle any other error non nil error not listed in the map

@nixpanic nixpanic requested a review from Madhu-1 June 22, 2023 08:37
@Rakshith-R
Copy link
Contributor

@Mergifyio queue

@mergify
Copy link
Contributor

mergify bot commented Jun 22, 2023

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at cdaa926

this commit removes grpc import from replication.go
and replaced it with usual errors and passed gRPC
responses in csi-addons

Signed-off-by: riya-singhal31 <rsinghal@redhat.com>
@mergify mergify bot added the ok-to-test Label to trigger E2E tests label Jun 22, 2023
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.25

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.26

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.25

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.26

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.25

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.26

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Jun 22, 2023
@mergify mergify bot merged commit cdaa926 into ceph:devel Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants