Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResponseTimeoutMode.FROM_START works correctly with RetryingClient #6025

Merged
merged 3 commits into from
Dec 11, 2024

Conversation

jrhee17
Copy link
Contributor

@jrhee17 jrhee17 commented Dec 9, 2024

Motivation:

A bug was reported that ResponseTimeoutMode.FROM_START does not work correctly when used with a RetryingClient. The cause was because how the responseTimeout is calculated for RetryingClient.

RetryingClient bounds responseTimeout by computing the responseTimeout on each iteration from its internal State.

ctx.setResponseTimeoutMillis(TimeoutMode.SET_FROM_NOW, responseTimeoutMillis);

If the CancellationScheduler has not been started yet, the set timeout is returned as-is via CancellationScheduler#timeoutNanos and is set at for the derived ctx.

responseCancellationScheduler =
CancellationScheduler.ofClient(TimeUnit.MILLISECONDS.toNanos(ctx.responseTimeoutMillis()));

However, CancellationScheduler#timeoutNanos defines its contract as returning the timeoutNanos if not started, and returning timeoutNanos since the startTime if already started.

/**
* Before the scheduler has started, the configured timeout will be returned regardless of the
* {@link TimeoutMode}. If the scheduler has already started, the timeout since
* {@link #startTimeNanos()} will be returned.
*/

Hence, CancellationScheduler#setTimeoutNanos tries to set the time remaining, but CancellationScheduler#timeoutNanos will return the timeout since CancellationScheduler#start is called.

Since the semantics of CancellationScheduler#timeoutNanos has value in retaining the originally set value, I propose that a new CancellationScheduler#remainingTimeoutNanos is introduced which returns the remaining timeout if a scheduler has been started.

Modifications:

  • Introduced CancellationScheduler#remainingTimeoutNanos which returns the remaining responseTimeout in nanos.
  • Replaced ClientRequestContext#responseTimeoutMillis with ClientRequestContextExtension#remainingTimeoutNanos in ArmeriaClientCall and DefaultClientRequestContext
  • Removed unneeded usages of ClientRequestContext#responseTimeoutMillis in HttpResponseWrapper

Result:

  • ResponseTimeoutMode.FROM_START correctly bounds requests that go through RetryingClient

@jrhee17 jrhee17 marked this pull request as ready for review December 9, 2024 11:51
@jrhee17 jrhee17 added the defect label Dec 9, 2024
@jrhee17 jrhee17 added this to the 1.31.3 milestone Dec 9, 2024
Copy link
Contributor

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new approach looks nice. 👍

Copy link
Contributor

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new approach looks nice. 👍

Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

if (timeoutNanos == Long.MAX_VALUE) {
return 0;
}
if (!isStarted()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) lock is used for thread-safety in DefaultCancellationScheduler, but is remainingTimeoutNanos unnecessary to use lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our purpose, I didn't think a lock is necessary since the scheduler is started either 1) before the decorator chain is triggered 2) or after the decorator chain is fully run. Meanwhile, remainingTimeoutNanos method is not a public method and our usage guarantees it is run during the decorator chain.

Having said this, I think in the context of DefaultCancellationScheduler itself, this logic may cause confusion to maintainers and locking is a small price to pay. Updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there was a problem with the logic; rather, it was confusion with consistency in implementation. The new change looks good. 👍

@ikhoon ikhoon merged commit 244e5cb into line:main Dec 11, 2024
14 checks passed
ikhoon pushed a commit to ikhoon/armeria that referenced this pull request Dec 11, 2024
line#6025)

Motivation:

A bug was reported that `ResponseTimeoutMode.FROM_START` does not work
correctly when used with a `RetryingClient`. The cause was because how
the `responseTimeout` is calculated for `RetryingClient`.

`RetryingClient` bounds `responseTimeout` by computing the
`responseTimeout` on each iteration from its internal `State`.

https://github.com/line/armeria/blob/fa76e99fa6132545df3a8d05eeb81c5681ec8953/core/src/main/java/com/linecorp/armeria/client/retry/AbstractRetryingClient.java#L188

If the `CancellationScheduler` has not been started yet, the set timeout
is returned as-is via `CancellationScheduler#timeoutNanos` and is set at
for the derived ctx.

https://github.com/line/armeria/blob/fa76e99fa6132545df3a8d05eeb81c5681ec8953/core/src/main/java/com/linecorp/armeria/internal/client/DefaultClientRequestContext.java#L543-L544

However, `CancellationScheduler#timeoutNanos` defines its contract as
returning the `timeoutNanos` if not started, and returning
`timeoutNanos` since the `startTime` if already started.

https://github.com/line/armeria/blob/fa76e99fa6132545df3a8d05eeb81c5681ec8953/core/src/main/java/com/linecorp/armeria/internal/common/CancellationScheduler.java#L104-L108

Hence, `CancellationScheduler#setTimeoutNanos` tries to set the time
remaining, but `CancellationScheduler#timeoutNanos` will return the
timeout since `CancellationScheduler#start` is called.

Since the semantics of `CancellationScheduler#timeoutNanos` has value in
retaining the originally set value, I propose that a new
`CancellationScheduler#remainingTimeoutNanos` is introduced which
returns the remaining timeout if a scheduler has been started.

Modifications:

- Introduced `CancellationScheduler#remainingTimeoutNanos` which returns
the remaining `responseTimeout` in nanos.
- Replaced `ClientRequestContext#responseTimeoutMillis` with
`ClientRequestContextExtension#remainingTimeoutNanos` in
`ArmeriaClientCall` and `DefaultClientRequestContext`
- Removed unneeded usages of
`ClientRequestContext#responseTimeoutMillis` in `HttpResponseWrapper`

Result:

- `ResponseTimeoutMode.FROM_START` correctly bounds requests that go
through `RetryingClient`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants