From b7f4f7a8906185ecdb070e5ee7c560a14fede2e0 Mon Sep 17 00:00:00 2001 From: Madhu Rajanna Date: Fri, 28 Oct 2022 11:10:57 +0200 Subject: [PATCH] replication: define error conditions for GetVolumeReplicationInfo In some cases the driver might not be able to return the Replication details for a period of time, in that case it can return well known error codes based on that CO can retry to call the GetVolumeReplicationInfo in exponential backoff to get the required details. Signed-off-by: Madhu Rajanna --- replication/README.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/replication/README.md b/replication/README.md index 01e0422..f4fa7f1 100644 --- a/replication/README.md +++ b/replication/README.md @@ -350,3 +350,15 @@ message GetVolumeReplicationInfoResponse { .google.protobuf.Timestamp last_sync_time = 1; } ``` + +#### Error Scheme + +| Condition | gRPC Code | Description | Recovery Behavior | +| ------------------------------------------------ | --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Missing required field | 3 INVALID_ARGUMENT | Indicates that a required field is missing from the request. | Caller MUST fix the request by adding the missing required field before retrying. | +| Volume does not exist or volume replication details not found | 5 NOT_FOUND | Indicates that a volume corresponding to the specified `volume_id` does not exist or replication details are not avaiable at the moment. | Caller MUST verify that the `volume_id` is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. | +| Volume is not replicated or image is not promoted | 9 FAILED_PRECONDITION | Indicates that the volume information corresponding to the specified volume_id could not retrived due to failed precondition (for example replication is not enabled or the image is not in the primary state). | Caller SHOULD ensure that replication is enabled and the image is promoted. | +| Operation pending for volume | 10 ABORTED | Indicates that there is already an operation pending for the specified `volume_id`. In general the Cluster Orchestrator (CO) is responsible for ensuring that there is no more than one call "in-flight" per `volume_id` at a given time. However, in some circumstances, the CO MAY lose state (for example when the CO crashes and restarts), and MAY issue multiple calls simultaneously for the same `volume_id`. The Plugin, SHOULD handle this as gracefully as possible, and MAY return this error code to reject secondary calls. | Caller SHOULD ensure that there are no other calls pending for the specified `volume_id`, and then retry with exponential back off. | +| Call not implemented | 12 UNIMPLEMENTED | The invoked RPC is not implemented by the Plugin or disabled in the Plugin's current mode of operation. | Caller MUST NOT retry. | +| Not authenticated | 16 UNAUTHENTICATED | The invoked RPC does not carry secrets that are valid for authentication. | Caller SHALL either fix the secrets provided in the RPC, or otherwise regalvanize said secrets such that they will pass authentication by the Plugin for the attempted RPC, after which point the caller MAY retry the attempted RPC. | +| Error is Unknown | 2 UNKNOWN | Indicates that a unknown error is generated | Caller MUST study the logs before retrying |