Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

FR: provide the ability to set per-request RetrySettings #1197

Closed
skuruppu opened this issue Oct 1, 2020 · 10 comments · Fixed by #1238
Closed

FR: provide the ability to set per-request RetrySettings #1197

skuruppu opened this issue Oct 1, 2020 · 10 comments · Fixed by #1238
Assignees
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@skuruppu
Copy link

skuruppu commented Oct 1, 2020

In googleapis/java-spanner, we want to allow users set per-request timeout and retry settings.

Currently, the timeouts and retry settings are set at the time of client creation when the UnaryCallable is created (example). This means the retry settings are fixed per client instance. Our customers have requested for the ability to set RetrySettings per request, so that they may even be able to set them programmatically depending on the type of query that is being executed.

There is a discussion on this in #561 but I would like to create an explicit feature request for this.

Note that we have already brought this up with grpc-java and they asked us to follow-up here.

CC @ramiyer

@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Oct 1, 2020
@AlanGasperini AlanGasperini added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels Oct 6, 2020
@noahdietz noahdietz removed the 🚨 This issue needs some love. label Oct 6, 2020
@noahdietz
Copy link
Contributor

Hi @skuruppu thanks for the FR, this is something I've been thinking about already, so it's great to have your input.

My current idea was limited to just exposing some additional timeout setting on the ApiCallContext (the abstraction for HTTP & gRPC call contexts we use in GAX) that would filter through the stack to the retry machinery. It sounds like y'all want to have access to more than that though, which makes sense, especially relative to other GAPIC languages that support augmenting "retry settings" on a per-invocation basis.

Can you help me understand the use cases you have in mind?

  • Is it just attempt delay settings we want to change?
  • Is it just timeout(s) we want to be able to change?
  • Is it the entire RetrySettings object we want to support mutating?
  • Is it changes to the list of retryable codes too? (this list is not part of the RetrySettings today, it is separate)

Fortunately, these changes only need to be scoped to gax-java because the ApiCallContext is accessible from the existing GAPIC surface i.e. there shouldn't be a generator change necessary.

@skuruppu
Copy link
Author

skuruppu commented Oct 9, 2020

Thanks for following up @noahdietz. I will reply inline but @ramiyer, if you could also reply with your specific requirements, that would help a lot since you have a better understanding of how you plan to use this feature.

Hi @skuruppu thanks for the FR, this is something I've been thinking about already, so it's great to have your input.

My current idea was limited to just exposing some additional timeout setting on the ApiCallContext (the abstraction for HTTP & gRPC call contexts we use in GAX) that would filter through the stack to the retry machinery. It sounds like y'all want to have access to more than that though, which makes sense, especially relative to other GAPIC languages that support augmenting "retry settings" on a per-invocation basis.

Can you help me understand the use cases you have in mind?

  • Is it just attempt delay settings we want to change?

The initial request was to change the delay settings from my understanding.

  • Is it just timeout(s) we want to be able to change?

They also mentioned that they would like to change the timeouts as well.

  • Is it the entire RetrySettings object we want to support mutating?

I'm not sure so @ramiyer to confirm.

  • Is it changes to the list of retryable codes too? (this list is not part of the RetrySettings today, it is separate)

Also not sure so @ramiyer to confirm.

Fortunately, these changes only need to be scoped to gax-java because the ApiCallContext is accessible from the existing GAPIC surface i.e. there shouldn't be a generator change necessary.

That's great :)

@noahdietz
Copy link
Contributor

If any of the backoff settings were in question, I'd say we want the entire RetrySettings object to be overridable from the ApiCallContext. We'd want to avoid flattening out the RetrySettings surface on the ApiCallContext.

Just to set expectation, the way the RetrySettings are supplied to a callable does not lend itself to per-invocation configurability. It's all just code and its certainly doable, but it will be a deep change to the codebase.

@noahdietz
Copy link
Contributor

Small update: I took a crack at threading the RetrySettings through the ApiCallContext and it was actually much easier/less invasive than I anticipated - at least for Unary RPCs. I haven't tried Server Streaming yet, which has some different retry behavior based on resumption strategy.

@noahdietz
Copy link
Contributor

Update: Things got a little more complicated. RPCs that are configured to not be retried at client creation results in a simpler/smaller callable chain being created - one that does not have the same codepath as RPCs that are configured to be retried. In order to make the ApiCallContext-provided, per-invocation RetrySettings to work for non-retryable RPCs, we will need to either:

A. unify the callable chain making retryable and non-retryable RPCs have the same codepath
B. wrap the callable typically returned for non-retryable RPCs in a new Callable type that consults the ApiCallContext for those values of the RetrySettings that apply to non-retryable RPCs

On the other hand, one cannot make a non-retryable RPC retryable on a per-invocation basis unless we also expose overriding the retryable status codes, which I'm not sure we want to do. As such, the concerns might not be a problem at all, and just having per-invocation RetrySettings work for retryable RPCs would be straight forward. FWIW there is already a way to change the timeout of a non-retryable RPC via the ApiCallContext today so this FR isn't necessarily relevant to non-retryable RPCs.

@skuruppu
Copy link
Author

Yeah, I agree with prioritizing retryable RPCs and not adding this support yet for non-retryable RPCs.

We may not want to encourage retrying non-retryable RPCs in any case since if a user needs this, then there are errors that they should handle explicitly. I don't believe the automatic retry logic of gax could handle these cases in a meaningful way.

@noahdietz
Copy link
Contributor

noahdietz commented Oct 28, 2020

Hi folks, so I haven't had time to work on this more, and I am going on leave. I parked the initial changes in #1211, but it does not include the changes necessary for passing these settings through the ServerStreamingRetryAlgorithm.

We haven't scheduled this work, so the team would need to get it on the calendar. I was just exploring this. Explaining the specific use-case more would be useful too.

@skuruppu
Copy link
Author

@noahdietz thanks for making this much progress and for keeping us in the loop. Let us know if there's anything we can help with to get this work scheduled. I would be more than happy to have @olavloite help out with the implementation as long as there's someone from the gax-java team that can guide the design and implementation.

@noahdietz
Copy link
Contributor

Note: @skuruppu and I synced offline and we will have an update on this work in a few days.

@olavloite
Copy link
Contributor

olavloite commented Nov 5, 2020

  • Is it changes to the list of retryable codes too? (this list is not part of the RetrySettings today, it is separate)

I would say that there are a couple of additional reasons for wanting to be able to set this:

  • If an RPC is configured to be retryable, but DEADLINE_EXCEEDED is not part of the retryable codes for that RPC, then the call will fail with a timeout error already after exceeding the RetrySettings#getInitialRpcTimeout(). An RPC that does have DEADLINE_EXCEEDED as a retryable code will however not fail after RetrySettings#getInitialRpcTimeout(), but after RetrySettings#getTotalTimeout().
  • It is already possible to set a timeout per call through the ApiContext, so adding this feature only for 'simple' timeouts is not necessary.

This might not seem very logical from a user perspective. Consider a call that uses the following RetrySettings for an RPC that does not have DEADLINE_EXCEEDED as one of its retryable codes:

                  RetrySettings.newBuilder()
                    .setInitialRetryDelay(Duration.ZERO)
                    .setInitialRpcTimeout(Duration.ofMillis(100L))
                    .setTotalTimeout(Duration.ofSeconds(10L))
                    .setMaxRpcTimeout(Duration.ofSeconds(10L))
                    .build()

This call will fail with a DEADLINE_EXCEEDED after 100 milliseconds, and not the 10 seconds that the user would probably expect.

olavloite added a commit to olavloite/gax-java that referenced this issue Nov 9, 2020
Allows applications to supply retry settings and retryabe codes for individual RPCs
through ApiCallContext.

Fixes googleapis#1197
olavloite added a commit that referenced this issue Mar 18, 2021
* feat: support retry settings and retryable codes in call context

Allows applications to supply retry settings and retryabe codes for individual RPCs
through ApiCallContext.

Fixes #1197

* chore: fix formatting and license headers

* feat: use context retry settings for streaming calls

* fix: process review comments

* fix: missed one Thread.sleep

* fix: fix style

* fix: address review comments

* refactor: remove ContextAwareResultRetryAlgorithm

* refactor: remove ContextAwareTimedRetryAlgorithm

* fix: make package-private + add ignored

* revert: revert to always using global settings for jittered

* test: add tests for NoopRetryingContext

* chore: cleanup 'this' usage

* refactor: merge context aware classes with base classes

* chore: cleanup and remove unnecessary methods

* test: add safety margin as the retry settings are now jittered

* docs: add javadoc

* chore: address review comments

* fix: address review comments

* fix: add tests + fix potential NPE

* fix: remove unused method + add tests

* test: add additional test calls

* fix: deprecation
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants