`ResponseTimeoutMode.FROM_START` works correctly with `RetryingClient` #6025

jrhee17 · 2024-12-09T10:31:10Z

Motivation:

A bug was reported that ResponseTimeoutMode.FROM_START does not work correctly when used with a RetryingClient. The cause was because how the responseTimeout is calculated for RetryingClient.

RetryingClient bounds responseTimeout by computing the responseTimeout on each iteration from its internal State.

armeria/core/src/main/java/com/linecorp/armeria/client/retry/AbstractRetryingClient.java

Line 188 in fa76e99

ctx.setResponseTimeoutMillis(TimeoutMode.SET_FROM_NOW, responseTimeoutMillis);

If the CancellationScheduler has not been started yet, the set timeout is returned as-is via CancellationScheduler#timeoutNanos and is set at for the derived ctx.

armeria/core/src/main/java/com/linecorp/armeria/internal/client/DefaultClientRequestContext.java

Lines 543 to 544 in fa76e99

    
           responseCancellationScheduler = 
        
                   CancellationScheduler.ofClient(TimeUnit.MILLISECONDS.toNanos(ctx.responseTimeoutMillis()));

However, CancellationScheduler#timeoutNanos defines its contract as returning the timeoutNanos if not started, and returning timeoutNanos since the startTime if already started.

armeria/core/src/main/java/com/linecorp/armeria/internal/common/CancellationScheduler.java

Lines 104 to 108 in fa76e99

    
               /** 
        
                * Before the scheduler has started, the configured timeout will be returned regardless of the 
        
                * {@link TimeoutMode}. If the scheduler has already started, the timeout since 
        
                * {@link #startTimeNanos()} will be returned. 
        
                */

Hence, CancellationScheduler#setTimeoutNanos tries to set the time remaining, but CancellationScheduler#timeoutNanos will return the timeout since CancellationScheduler#start is called.

Since the semantics of CancellationScheduler#timeoutNanos has value in retaining the originally set value, I propose that a new CancellationScheduler#remainingTimeoutNanos is introduced which returns the remaining timeout if a scheduler has been started.

Modifications:

Introduced CancellationScheduler#remainingTimeoutNanos which returns the remaining responseTimeout in nanos.
Replaced ClientRequestContext#responseTimeoutMillis with ClientRequestContextExtension#remainingTimeoutNanos in ArmeriaClientCall and DefaultClientRequestContext
Removed unneeded usages of ClientRequestContext#responseTimeoutMillis in HttpResponseWrapper

Result:

ResponseTimeoutMode.FROM_START correctly bounds requests that go through RetryingClient

github-actions · 2024-12-09T11:44:47Z

🔍 Build Scan® (commit: `f009a2a`)

Job name	Status	Build Scan®
build-ubicloud-standard-8-jdk-8	✅	https://ge.armeria.dev/s/3y7qemv4glg6q
build-ubicloud-standard-8-jdk-21-snapshot-blockhound	✅	https://ge.armeria.dev/s/w7ietemf7r4jk
build-ubicloud-standard-8-jdk-17-min-java-17-coverage	✅	https://ge.armeria.dev/s/3smlcjtldu2me
build-ubicloud-standard-8-jdk-17-min-java-11	✅	https://ge.armeria.dev/s/d6iz3ygrmjhfy
build-ubicloud-standard-8-jdk-17-leak	✅	https://ge.armeria.dev/s/ujayhmxp2xswm
build-ubicloud-standard-8-jdk-11	✅	https://ge.armeria.dev/s/cm4ovcyvuxrqs
build-macos-latest-jdk-21	✅	https://ge.armeria.dev/s/nopx355moyvhe

minwoox

The new approach looks nice. 👍

minwoox

The new approach looks nice. 👍

ikhoon

Thanks!

ikhoon · 2024-12-10T14:41:25Z

core/src/main/java/com/linecorp/armeria/internal/common/DefaultCancellationScheduler.java

+        if (timeoutNanos == Long.MAX_VALUE) {
+            return 0;
+        }
+        if (!isStarted()) {


Question) lock is used for thread-safety in DefaultCancellationScheduler, but is remainingTimeoutNanos unnecessary to use lock?

For our purpose, I didn't think a lock is necessary since the scheduler is started either 1) before the decorator chain is triggered 2) or after the decorator chain is fully run. Meanwhile, remainingTimeoutNanos method is not a public method and our usage guarantees it is run during the decorator chain.

Having said this, I think in the context of DefaultCancellationScheduler itself, this logic may cause confusion to maintainers and locking is a small price to pay. Updated.

I don't think there was a problem with the logic; rather, it was confusion with consistency in implementation. The new change looks good. 👍

line#6025) Motivation: A bug was reported that `ResponseTimeoutMode.FROM_START` does not work correctly when used with a `RetryingClient`. The cause was because how the `responseTimeout` is calculated for `RetryingClient`. `RetryingClient` bounds `responseTimeout` by computing the `responseTimeout` on each iteration from its internal `State`. https://github.com/line/armeria/blob/fa76e99fa6132545df3a8d05eeb81c5681ec8953/core/src/main/java/com/linecorp/armeria/client/retry/AbstractRetryingClient.java#L188 If the `CancellationScheduler` has not been started yet, the set timeout is returned as-is via `CancellationScheduler#timeoutNanos` and is set at for the derived ctx. https://github.com/line/armeria/blob/fa76e99fa6132545df3a8d05eeb81c5681ec8953/core/src/main/java/com/linecorp/armeria/internal/client/DefaultClientRequestContext.java#L543-L544 However, `CancellationScheduler#timeoutNanos` defines its contract as returning the `timeoutNanos` if not started, and returning `timeoutNanos` since the `startTime` if already started. https://github.com/line/armeria/blob/fa76e99fa6132545df3a8d05eeb81c5681ec8953/core/src/main/java/com/linecorp/armeria/internal/common/CancellationScheduler.java#L104-L108 Hence, `CancellationScheduler#setTimeoutNanos` tries to set the time remaining, but `CancellationScheduler#timeoutNanos` will return the timeout since `CancellationScheduler#start` is called. Since the semantics of `CancellationScheduler#timeoutNanos` has value in retaining the originally set value, I propose that a new `CancellationScheduler#remainingTimeoutNanos` is introduced which returns the remaining timeout if a scheduler has been started. Modifications: - Introduced `CancellationScheduler#remainingTimeoutNanos` which returns the remaining `responseTimeout` in nanos. - Replaced `ClientRequestContext#responseTimeoutMillis` with `ClientRequestContextExtension#remainingTimeoutNanos` in `ArmeriaClientCall` and `DefaultClientRequestContext` - Removed unneeded usages of `ClientRequestContext#responseTimeoutMillis` in `HttpResponseWrapper` Result: - `ResponseTimeoutMode.FROM_START` correctly bounds requests that go through `RetryingClient`

jrhee17 added 2 commits December 9, 2024 18:50

minimal impl

5e8111c

cleanup

34392d4

jrhee17 marked this pull request as ready for review December 9, 2024 11:51

jrhee17 requested review from ikhoon, minwoox and trustin as code owners December 9, 2024 11:51

jrhee17 added the defect label Dec 9, 2024

jrhee17 added this to the 1.31.3 milestone Dec 9, 2024

minwoox approved these changes Dec 10, 2024

View reviewed changes

ikhoon approved these changes Dec 10, 2024

View reviewed changes

address comment by @ikhoon

f009a2a

ikhoon merged commit 244e5cb into line:main Dec 11, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ResponseTimeoutMode.FROM_START` works correctly with `RetryingClient` #6025

`ResponseTimeoutMode.FROM_START` works correctly with `RetryingClient` #6025

jrhee17 commented Dec 9, 2024

github-actions bot commented Dec 9, 2024 •

edited

Loading

minwoox left a comment

minwoox left a comment

ikhoon left a comment

ikhoon Dec 10, 2024

jrhee17 Dec 11, 2024

ikhoon Dec 11, 2024

	responseCancellationScheduler =
	CancellationScheduler.ofClient(TimeUnit.MILLISECONDS.toNanos(ctx.responseTimeoutMillis()));

	/**
	* Before the scheduler has started, the configured timeout will be returned regardless of the
	* {@link TimeoutMode}. If the scheduler has already started, the timeout since
	* {@link #startTimeNanos()} will be returned.
	*/

ResponseTimeoutMode.FROM_START works correctly with RetryingClient #6025

ResponseTimeoutMode.FROM_START works correctly with RetryingClient #6025

Conversation

jrhee17 commented Dec 9, 2024

github-actions bot commented Dec 9, 2024 • edited Loading

🔍 Build Scan® (commit: f009a2a)

minwoox left a comment

Choose a reason for hiding this comment

minwoox left a comment

Choose a reason for hiding this comment

ikhoon left a comment

Choose a reason for hiding this comment

ikhoon Dec 10, 2024

Choose a reason for hiding this comment

jrhee17 Dec 11, 2024

Choose a reason for hiding this comment

ikhoon Dec 11, 2024

Choose a reason for hiding this comment

`ResponseTimeoutMode.FROM_START` works correctly with `RetryingClient` #6025

`ResponseTimeoutMode.FROM_START` works correctly with `RetryingClient` #6025

github-actions bot commented Dec 9, 2024 •

edited

Loading

🔍 Build Scan® (commit: `f009a2a`)