Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor changes #1

Merged
merged 2 commits into from
Feb 11, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 29 additions & 27 deletions A6.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ gRPC client library will automatically retry failed RPCs according to a policy s

Currently, gRPC does not retry failed RPCs. All failed RPCs are immediately returned to the application layer by the gRPC client library.

Many teams have implemented their own retry logic wrapped around gRPC like [Veneer Toolkit](https://github.com/googleapis/toolkit), and [Cloud Bigtable](https://github.com/GoogleCloudPlatform/cloud-bigtable-client).
Many teams have implemented their own retry logic wrapped around gRPC like [Veneer Toolkit](https://github.com/googleapis/toolkit) and [Cloud Bigtable](https://github.com/GoogleCloudPlatform/cloud-bigtable-client).

## Proposal

Expand Down Expand Up @@ -94,17 +94,17 @@ The retry policy may specify the following parameters for exponential backoff:

```
'exponential_backoff' {
'initial_backoff': 1000ms,
'max_backoff': 5000ms,
'multiplier': 3
'initial_backoff_ms': 1000,
'max_backoff_ms': 5000,
'multiplier': 3
}
```

A value called `current_backoff` is initially set to `initial_backoff`. After every failed RPC, `current_backoff` is set to `min(current_backoff * multiplier, max_backoff)`. The failed RPCs will be retried after *x* seconds, where *x* is defined as `random(0, current_backoff)`.

##### Retryable Status Codes

When gRPC receives a non-OK response status from a server, this status is checked against the set of retryable status codes to determine if a retry attempt should be made. The default set of retryable status codes contains only `UNAVAILABLE`, but this can be explicitly set in the retry policy as follows:
When gRPC receives a non-OK response status from a server, this status is checked against the set of retryable status codes to determine if a retry attempt should be made.

```
'retryable_status_codes': {UNAVAILABLE}
Expand All @@ -119,10 +119,10 @@ Hedging enables aggressively sending multiple copies of a single request without
Hedged requests are configured with the following parameters:

```
'hedging': {
'max_requests' : 3,
'hedging_delay' : 500ms,
'non_fatal_status_codes': {UNAVAILABLE, INTERNAL, ABORTED}
'hedging_policy': {
'max_requests' : 3,
'hedging_delay_ms' : 500,
'non_fatal_status_codes': {UNAVAILABLE, INTERNAL, ABORTED}
}
```

Expand All @@ -136,7 +136,7 @@ If a non-fatal status code is received from a hedged request, then the next hedg

If all instances of a hedged RPC fail, there are no additional retry attempts. Essentially, hedging can be seen as retrying the original RPC before a failure is even received.

If server pushback is received in response to a hedged request, no further hedged requests should be issued for the call.
If server pushback that specifies not to retry is received in response to a hedged request, no further hedged requests should be issued for the call.

Hedged requests should be sent to distinct backends, if possible.

Expand All @@ -146,8 +146,8 @@ gRPC prevents server overload due to retries and hedged RPCs by disabling these

```
'retry_throttling': {
'max_tokens': 10,
'token_ratio': 0.1
'max_tokens': 10,
'token_ratio': 0.1
}
```

Expand Down Expand Up @@ -214,7 +214,7 @@ To clarify the second scenario, we define an *outgoing message* as everything th

#### Buffered RPCs

The gRPC library will have a configurable amount of available memory, `retry_buffer_size`, to buffer outgoing retryable or hedged RPCs. There is also a per-RPC size limit ,`per_rpc_buffer_limit`.
The gRPC library will have a configurable amount of available memory, `retry_buffer_size`, to buffer outgoing retryable or hedged RPCs. There is also a per-RPC size limit ,`per_rpc_buffer_limit`. These limits are configured by the client, rather than coming from the service config.

RPCs may only be retried when they are contained in the buffer.

Expand Down Expand Up @@ -280,41 +280,43 @@ Retry and hedging configuration is set as part of the service config, which is t

Service owners must choose between a retry policy or a hedging policy. Unless the service owner specifies a policy in the configuration, retries and hedging will not be enabled. The retry policy and hedging policy each have their own set of configuration options, detailed below.

The parameters for throttling retry attempts and hedging when failures exceed a certain threshold are also set in the service config. Throttling applies across methods and services on a particular server, and thus may only be configured per-server name.
The parameters for throttling retry attempts and hedged RPCs when failures exceed a certain threshold are also set in the service config. Throttling applies across methods and services on a particular server, and thus may only be configured per-server name.

#### Retry Policy

This is an example of a retry policy and its associated configuration. It implements exponential backoff with a maximum of three retry attempts, only retrying RPCs when an `UNAVAILABLE` status code is received.

```
'max_retry_attempts': 3
'exponential_backoff': {
'initial_backoff': 1000ms,
'max_backoff': 5000ms,
'retry_policy': {
'max_retry_attempts': 3
'exponential_backoff': {
'initial_backoff_ms': 1000,
'max_backoff_ms': 5000,
'multiplier': 3
}
'retryable_status_codes': {UNAVAILABLE}
}
'retryable_status_codes': {UNAVAILABLE}
```

#### Hedging Policy

The following example of a hedging policy configuration will issue up to three hedged requests for each RPC, spaced out at 500ms intervals, until either: one of the requests receives a valid response, all fail, or the overall call deadline is reached. Analogously to `retryable_status_codes` for the retry policy, `non_fatal_status_codes` determines how hedging behaves when a non-OK response is received.

```
'hedging': {
'max_requests' : 3,
'hedging_delay' : 500ms,
'non_fatal_status_codes': {UNAVAILABLE, INTERNAL, ABORTED}
'hedging_policy': {
'max_requests': 3,
'hedging_delay_ms': 500,
'non_fatal_status_codes': {UNAVAILABLE, INTERNAL, ABORTED}
}
```

The following example issues three hedged requests simultaneously:

```
'hedging': {
'max_requests' : 3,
'hedging_delay' : 0ms,
'non_fatal_status_codes': {UNAVAILABLE, INTERNAL, ABORTED}
'hedging_policy': {
'max_requests': 3,
'hedging_delay_ms': 0,
'non_fatal_status_codes': {UNAVAILABLE, INTERNAL, ABORTED}
}
```

Expand Down