-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd: remove grpc and csi-addons import from rbd/replication.go #3884
Conversation
internal/rbd/errors.go
Outdated
// This is a most likely a transient condition and may be corrected | ||
// by retrying with a backoff. Note that it is not always safe to retry | ||
// non-idempotent operations. | ||
ErrUnavailable = errors.New("service is currently unavailable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ErrUnavailable could be ErrUnavailableService for better readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
internal/rbd/errors.go
Outdated
// ErrFailedPrecondition is returned when operation is rejected because the system is not in a state | ||
// required for the operation's execution. | ||
ErrFailedPrecondition = errors.New("operation is rejected because the system is not in a state required for the operation's execution.") | ||
// ErrUnavailable is returned when the service is currently unavailable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check the indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked issue does not require this change.
Can you provide the reason for making this change ?
grpc codes & status are used for a reason.
This is the update for - https://github.com/ceph/ceph-csi/pull/3608/files#r1092953730 |
The main devel branch should get only complete prs going from one stable state to another. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start for cleaning up, thanks!
You'll need to check the callers of the functions, and map the new errors to some gRPC codes. The CSI protocol expects a gRPC code+error to be returned from the upper API functions. Have a look at the CSI spec to see what code is suitable for the error in the particular API (CreateVolume
or DeleteVolume
may return a different error to its caller for the same internal failure).
Yes, I have used the similar error definitions as the corresponding grpc status and codes. |
return corerbd.DisableVolumeReplication(rbdVol, mirroringInfo, force) | ||
err = rbdVol.DisableVolumeReplication(mirroringInfo, force) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not return err
directly, but a gRPC error with a suitable code. DisableVolumeReplication()
did not seem to have done that in all cases, that was not correct. With this refactoring it becomes clear, and you should address that.
internal/rbd/replication.go
Outdated
) | ||
|
||
func (rv *rbdVolume) ResyncVol(localStatus librbd.SiteMirrorImageStatus, force bool) error { | ||
if resyncRequired(localStatus) { | ||
// If the force option is not set return the error message to retry | ||
// with Force option. | ||
if !force { | ||
return status.Errorf(codes.FailedPrecondition, | ||
"image is in %q state, description (%s). Force resync to recover volume", | ||
fmt.Errorf("image is in %q state, description (%s). Force resync to recover volume", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not look righ, you may want to assign it to err
and return that?
internal/rbd/replication.go:32:14: unusedresult: result of fmt.Errorf call not used (govet)
fmt.Errorf("image is in %q state, description (%s). Force resync to recover volume",
^
This now removes the replication service from internal/rbd and returns gRPC code+error in internal/csi-addons/rbd. |
internal/rbd/errors.go
Outdated
// ErrFailedPrecondition is returned when operation is rejected because the system is not in a state | ||
// required for the operation's execution. | ||
ErrFailedPrecondition = errors.New("system is not in a state required for the operation's execution") | ||
// ErrServiceUnavailable is returned when the service is currently unavailable. | ||
// This is a most likely a transient condition and may be corrected | ||
// by retrying with a backoff. Note that it is not always safe to retry | ||
// non-idempotent operations. | ||
ErrServiceUnavailable = errors.New("service is currently unavailable") | ||
// ErrAborted is returned when the operation is aborted. | ||
ErrAborted = errors.New("operation got aborted") | ||
// ErrInvalidArgument is returned when the client specified an invalid argument. | ||
ErrInvalidArgument = errors.New("invalid arguments provided") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are these error messages are used?
switch { | ||
case strings.Contains(err.Error(), "failed to get local state"): | ||
return nil, status.Error(codes.Internal, err.Error()) | ||
case strings.Contains(err.Error(), "secondary image status is"): | ||
return nil, status.Error(codes.InvalidArgument, err.Error()) | ||
case strings.Contains(err.Error(), "failed to disable image mirroring"): | ||
return nil, status.Error(codes.Internal, err.Error()) | ||
case strings.Contains(err.Error(), "failed to get mirroring info of image"): | ||
return nil, status.Error(codes.Internal, err.Error()) | ||
case strings.Contains(err.Error(), "in disabling state"): | ||
return nil, status.Error(codes.Aborted, err.Error()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should define an error and return it and use errors.IS/As
for comparison instead of string comparison.
internal/rbd/replication.go
Outdated
return status.Errorf(codes.FailedPrecondition, | ||
"image is in %q state, description (%s). Force resync to recover volume", | ||
localStatus.State, localStatus.Description) | ||
fmt.Errorf("image is in %q state, description (%s). Force resync to recover volume", localStatus.State, localStatus.Description) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fmt.Errorf(....)
statements return an error, and a lot of them are not used/returned.
You can probably do something like this which will be better:
return fmt.Errorf("%w: image is in %q state, description (%s). Force resync to recover volume", ErrFailedPrecondition, localStatus.State, localStatus.Description)
or
err = fmt.Errorf("%w: image is in %q state, description (%s). Force resync to recover volume", ErrFailedPrecondition, localStatus.State, localStatus.Description)
return err
By using %w
and pasing ErrFailedPrecondition
for the formatting, you get the error message in the error string, and the type of the error will still be ErrFailedPrecondition
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes Niels, I have already done this here (in 2nd commit),
return fmt.Errorf("image is in %q state, description (%s). Force resync to recover volume: %w",
internal/rbd/replication.go
Outdated
} | ||
err := rv.resyncImage() | ||
if err != nil { | ||
return status.Error(codes.Internal, err.Error()) | ||
fmt.Errorf("failed to resync image") | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What type of error is returned here? Maybe it makes sense to provide a comment at the top of the function that describes the different errors? The caller of the function can then easily figure out what errors need to be caught and handled differently.
return &replication.DisableVolumeReplicationResponse{}, nil | ||
} | ||
switch { | ||
case strings.Contains(err.Error(), "failed to get local state"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking for contents of the string, use errors.Is()
as function to check if an error is of a particular type.
ceph-csi/internal/rbd/clone.go
Lines 58 to 88 in 40888f0
switch { | |
case errors.Is(err, ErrSnapNotFound): | |
// as the snapshot is not present, create new snapshot,clone and | |
// delete the temporary snapshot | |
err = createRBDClone(ctx, tempClone, rv, snap) | |
if err != nil { | |
return false, err | |
} | |
return true, nil | |
case errors.Is(err, ErrImageNotFound): | |
// as the temp clone does not exist,check snapshot exists on parent volume | |
// snapshot name is same as temporary clone image | |
snap.RbdImageName = tempClone.RbdImageName | |
err = parentVol.checkSnapExists(snap) | |
if err == nil { | |
// the temp clone exists, delete it lets reserve a new ID and | |
// create new resources for a cleaner approach | |
err = parentVol.deleteSnapshot(ctx, snap) | |
} | |
if errors.Is(err, ErrSnapNotFound) { | |
return false, nil | |
} | |
return false, err | |
default: | |
// any error other than the above return error | |
return false, err | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can squash both commits into one, as the 2nd almost redoes everything from the 1st.
Applying only the 1st will likely not result in a functional ceph-csi executable. Only if the commits are in such an order and independent enough so that applying one at the time, run tests, apply the next, run tests, keeps working as it should, commits can be split. (That is the best way, but sometimes difficult to get right. It is required for git blame
which can be used for bug hunting.)
return nil, status.Error(codes.Internal, err.Error()) | ||
case errors.Is(err, corerbd.ErrAborted): | ||
return nil, status.Error(codes.Aborted, err.Error()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a default
case, otherwise uncaught errors will get ignored
Updated, thanks. |
Update: To reduce the cyclomatic complexity, changed the switch statements to storing the values in map. |
@@ -649,6 +665,17 @@ func (rs *ReplicationServer) ResyncVolume(ctx context.Context, | |||
return resp, nil | |||
} | |||
|
|||
func getError(err error, errorStatusMap map[error]codes.Code) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New helper functions like this require a unit-test to prevent them from breaking in the future. Please add one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the unit test, thank you!
@@ -627,9 +639,13 @@ func (rs *ReplicationServer) ResyncVolume(ctx context.Context, | |||
|
|||
err = rbdVol.ResyncVol(localStatus, req.Force) | |||
if err != nil { | |||
log.ErrorLog(ctx, err.Error()) | |||
errorStatusMap := map[error]codes.Code{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be a signle errorStatusMap
internally to the getError()
function. You use it twice, and there is no overlap in the keys.
@@ -326,7 +326,12 @@ func (rs *ReplicationServer) DisableVolumeReplication(ctx context.Context, | |||
case librbd.MirrorImageDisabling: | |||
return nil, status.Errorf(codes.Aborted, "%s is in disabling state", volumeID) | |||
case librbd.MirrorImageEnabled: | |||
return corerbd.DisableVolumeReplication(rbdVol, mirroringInfo, force) | |||
err = rbdVol.DisableVolumeReplication(mirroringInfo, force) | |||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the usual check is for err != nil
, and have the success path continue after the if-statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Niels, updated. PTAL
internal/rbd/replication.go
Outdated
} | ||
if mirroringInfo.State == librbd.MirrorImageDisabling { | ||
return nil, status.Errorf(codes.Aborted, "%s is in disabling state", rbdVol.VolID) | ||
return fmt.Errorf("%w: %s is in disabling state", ErrAborted, rv.VolID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("%w: %s is in disabling state", ErrAborted, rv.VolID) | |
return fmt.Errorf("%w: %q is in disabling state", ErrAborted, rv.VolID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still pending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
if err == nil { | ||
return status.Error(codes.OK, codes.OK.String()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the first check, move this to line 656
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
} | ||
} | ||
|
||
// Handle any other error nol nil error not listed in the map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Handle any other error nol nil error not listed in the map | |
// Handle any other error non nil error not listed in the map |
@Mergifyio queue |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at cdaa926 |
this commit removes grpc import from replication.go and replaced it with usual errors and passed gRPC responses in csi-addons Signed-off-by: riya-singhal31 <rsinghal@redhat.com>
/test ci/centos/k8s-e2e-external-storage/1.25 |
/test ci/centos/k8s-e2e-external-storage/1.26 |
/test ci/centos/k8s-e2e-external-storage/1.27 |
/test ci/centos/mini-e2e-helm/k8s-1.25 |
/test ci/centos/mini-e2e-helm/k8s-1.26 |
/test ci/centos/mini-e2e-helm/k8s-1.27 |
/test ci/centos/mini-e2e/k8s-1.25 |
/test ci/centos/mini-e2e/k8s-1.26 |
/test ci/centos/mini-e2e/k8s-1.27 |
/test ci/centos/upgrade-tests-cephfs |
/test ci/centos/upgrade-tests-rbd |
this commit removes grpc import from replication.go and replaced it with usual errors
updates - #3314