-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Add Retry Guide #10598
docs: Add Retry Guide #10598
Conversation
README.md
Outdated
Client Libraries use retries to handle unexpected, transient failures (i.e. Server temporarily unavailable). | ||
Multiple attempts, hopefully, will result in a successful response from the server. | ||
|
||
By default, retries are configured by the Cloud Service. The configured retry parameters are defined per RPC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add an example (the audience is client library users)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the audience's perspective, it's just "retries are controlled by the responses of the service." isn't it?
The configured retry parameters are defined per RPC.
If the audience doesn't have a way to read the definition, let's remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add an example (the audience is client library users)?
I'll move the Java-Asset example up to show the one service's default retry param values
From the audience's perspective, it's just "retries are controlled by the responses of the service." isn't it?
Controlled by response of the service and the time/ attempt bounds.
README.md
Outdated
- RPC Timeout Multiplier: The change in RPC timeout. This value is multiplied by the previous call’s timeout to calculate the timeout for the next call. | ||
- Max RPC Timeout: The limit of the RPC timeout, so that the RpcTimeoutMultiplier can't increase the RPC timeout higher than this amount | ||
|
||
Note: It is recommended to only set _either_ the Max Attempt or the Total Timeout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware of this recommendation, what's the reason behind it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My recollection was that there was an internal doc that recommended setting only either param, but I can't find it (and perhaps my memory is wrong). Looking at old tickets, it seems that maxAttempts was never implemented for any language (Java seems to be the exception).
I see this ticket I created a while back regarding maxAttempts: googleapis/sdk-platform-java#2306, so perhaps this recommendation shouldn't exist.
README.md
Outdated
|
||
See the official Google Cloud Docs: https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings | ||
|
||
#### Jitter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should mention Jitter
here because it is already deprecated and we should always set it to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think disabling the jitter flag is used quite a bit internally for tests to manage flakiness. I don't know how easily it would be to fully remove it in the future (perhaps we can just mark it with @VisibleForTests
).
I mentioned this section since I think it has a bit of behavior impact on the number of retries. If a customer is counting (or using otel), they may be expecting a certain number of retry attempts. I left out the setJittered(false)
configuration.
README.md
Outdated
#### Retry | ||
##### Example 1 | ||
```java | ||
RetrySettings.newBuilder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the same RetrySettings
for all examples? I think it would be easier to process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are slight differences in each RetrySetting. I can try and highlight the differences to make it clearer.
Co-authored-by: Tomo Suzuki <suztomo@google.com>
docs/client_retries.md
Outdated
@@ -0,0 +1,267 @@ | |||
# Client Side Retries | |||
Client Libraries use retries to handle unexpected, transient failures (i.e. server temporarily unavailable). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider putting this in backticks server temporarily unavailable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. I think it should be clearer now that it's not a status code or message back.
docs/client_retries.md
Outdated
Client Libraries use retries to handle unexpected, transient failures (i.e. server temporarily unavailable). | ||
Multiple attempts, hopefully, will result in a successful response from the server. | ||
|
||
By default, retries are configured by the Cloud Service. The configured retry parameters are defined per RPC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider rewording Cloud Service
to something like Default retry settings are configured per RPC by each backend cloud service
. Probably me being paranoid, but Service
is such an overloaded term, I don't want there to be confusion about a "new" proper noun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated. I tried rewording it to make it clearer. I didn't want to introduce the term backend, but I think it should make sense.
docs/client_retries.md
Outdated
## RPC Retry Configurations | ||
Using Java-Asset v3.41.0 as an example, the default configurations are configured in the following files: | ||
|
||
Retry Status Codes are configured [here](https://github.com/googleapis/google-cloud-java/blob/d9da511b4b56302e509abe8b2d919a15ea7dcae7/java-asset/google-cloud-asset/src/main/java/com/google/cloud/asset/v1/stub/AssetServiceStubSettings.java#L1058-L1082) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider linking Retry Status Codes
to where you define them later (line 51) - I think you can do that using anchor tags (https://gist.github.com/rachelhyman/b1f109155c9dafffe618)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Let me try and move Configurable Retry Params
section up.
docs/client_retries.md
Outdated
builder | ||
.exportAssetsSettings() | ||
.setRetryableCodes(RETRYABLE_CODE_DEFINITIONS.get("no_retry_0_codes")) | ||
.setRetrySettings(RETRY_PARAM_DEFINITIONS.get("no_retry_0_params")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not understanding how this information is to be used by a developer. Is this section just FYI on how the internals are implemented, or is this information important because developers will need to modify this when X.
(The examples below don't seem to require knowledge of the RETRYABLE_CODE_DEFINITIONS
and RETRY_PARAM_DEFINITIONS
collections.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intended this section to be to show where users can find the default retry configurations. I'll update this section to be clearer.
(The examples below don't seem to require knowledge of the RETRYABLE_CODE_DEFINITIONS and RETRY_PARAM_DEFINITIONS collections.)
Those definitions aren't important and are internal details to gax.
@burkedavison @suztomo @blakeli0 @alicejli Could I get a re-review on this whenever you all get time? Plan is to add details for retries, LRO, endpoint, client initialization, etc. so that we can reference this on future customer issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for putting together! this will be very helpful to reference for customers. at some point it'd be nice to get a version of this guide onto Cloud RAD.
``` | ||
Initial Retry Delay: 100ms | ||
Retry Delay Multiplier: 2.0 | ||
Max Retry Delay: 500ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dose this setting cause retry indefinite times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this current configuration implies infinitely. I'll make that clearer in the example. Thanks!
docs/client_retries.md
Outdated
|
||
### Jitter | ||
Jitter is added variance via randomness to spread out when the RPCs are invoked. By default, Google Cloud | ||
Client Libraries enable jitter for retries. When jitter is enabled with exponential backoff, the client libraries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"When jitter is enabled" implies that it can be disabled, but setJittered
is now deprecated. Can we change the wording to indicate that Jitter is always enabled
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than one minor comment.
Fixes: #10586
Add a section to the README.md to include basic information about how Google Cloud Java Client Libraries deal with retries.