From 5769869519161581b3b26f57c3fe4828268e2d10 Mon Sep 17 00:00:00 2001 From: adamw Date: Tue, 31 Dec 2024 12:30:34 +0100 Subject: [PATCH] Release 0.5.8 --- generated-doc/out/utils/repeat.md | 2 +- generated-doc/out/utils/retries.md | 81 +++++++++++++++++++++++++++- generated-doc/out/utils/scheduled.md | 8 +-- 3 files changed, 82 insertions(+), 9 deletions(-) diff --git a/generated-doc/out/utils/repeat.md b/generated-doc/out/utils/repeat.md index 2ed4843e..5317b266 100644 --- a/generated-doc/out/utils/repeat.md +++ b/generated-doc/out/utils/repeat.md @@ -27,7 +27,7 @@ Similarly to the `retry` API, the `operation` can be defined: The `repeat` config requires a `Schedule`, which indicates how many times and with what interval should the `operation` be repeated. -In addition, it is possible to define a custom `shouldContinueOnSuccess` strategy for deciding if the operation +In addition, it is possible to define a custom `shouldContinueOnResult` strategy for deciding if the operation should continue to be repeated after a successful result returned by the previous operation (defaults to `_: T => true`). If an operation returns an error, the repeat loop will always be stopped. If an error handling within the operation diff --git a/generated-doc/out/utils/retries.md b/generated-doc/out/utils/retries.md index e22dcd0c..98b023ca 100644 --- a/generated-doc/out/utils/retries.md +++ b/generated-doc/out/utils/retries.md @@ -121,7 +121,86 @@ retry(RetryConfig(Schedule.Immediate(3), ResultPolicy.retryWhen(_.getMessage != retryEither(RetryConfig(Schedule.Immediate(3), ResultPolicy.retryWhen(_ != "fatal error")))(eitherOperation) // custom error mode -retryWithErrorMode(UnionMode[String])(RetryConfig(Schedule.Immediate(3), ResultPolicy.retryWhen(_ != "fatal error")))(unionOperation) +retryWithErrorMode(UnionMode[String])( + RetryConfig(Schedule.Immediate(3), ResultPolicy.retryWhen(_ != "fatal error")))(unionOperation) ``` See the tests in `ox.resilience.*` for more. + +## Adaptive retries + +A retry strategy, backed by a token bucket. Every retry costs a certain amount of tokens from the bucket, and every success causes some tokens to be added back to the bucket. If there are not enought tokens, retry is not attempted. + +This way retries don't overload a system that is down due to a systemic failure (such as a bug in the code, excessive load etc.): retries will be attempted only as long as there are enought tokens in the bucket, then the load on the downstream system will be reduced so that it can recover. In contrast, using a "normal" retry strategy, where every operation is retries up to 3 times, a failure causes the load on the system to increas 4 times. + +For transient failures (component failure, infrastructure issues etc.), retries still work "normally", as the bucket has enough tokens to cover the cost of multiple retries. + +### Inspiration + +* [`AdaptiveRetryStrategy`](https://github.com/aws/aws-sdk-java-v2/blob/master/core/retries/src/main/java/software/amazon/awssdk/retries/AdaptiveRetryStrategy.java) from `aws-sdk-java-v2` +* *["Try again: The tools and techniques behind resilient systems" from re:Invent 2024](https://www.youtube.com/watch?v=rvHd4Y76-fs) + +### Configuration + +To use adaptive retries, create an instance of `AdaptiveRetry`. These instances are thread-safe and are designed to be shared. Typically, a single instance should be used to proxy access to a single constrained resource. + +`AdaptiveRetry` is parametrized with: + +* `tokenBucket: Tokenbucket`: instances of `TokenBucket` can be shared across multiple instances of `AdaptiveRetry` +* `failureCost: Int`: number of tokens that are needed for retry in case of failure +* `successReward: Int`: number of tokens that are added back to token bucket after success + +`RetryConfig` and `ResultPolicy` are defined the same as with "normal" retry mechanism, all the configuration from above also applies here. + +Instance with default configuration can be obtained with `AdaptiveRetry.default` (bucket size = 500, cost for failure = 5 and reward for success = 1). + +### API + +`AdaptiveRetry` exposes three variants of retrying, which correspond to the three variants discussed above: `retry`, `retryEither` and `retryWithErrorMode`. + +`retry` will attempt to retry an operation if it throws an exception; `retryEither` will additionally retry, if the result is a `Left`. Finally `retryWithErrorMode` is the most flexible, and allows retrying operations using custom failure modes (such as union types). + +The methods have an additional parameter, `shouldPayPenaltyCost`, which determines if result `T` should be considered failure in terms of paying cost for retry. Penalty is paid only if it is decided to retry operation, the penalty will not be paid for successful operation. + +### Examples + +If you want to use this mechanism you need to run operation through instance of `AdaptiveRetry`: + +```scala +import ox.UnionMode +import ox.resilience.AdaptiveRetry +import ox.resilience.{ResultPolicy, RetryConfig} +import ox.scheduling.{Jitter, Schedule} +import scala.concurrent.duration.* + +def directOperation: Int = ??? +def eitherOperation: Either[String, Int] = ??? +def unionOperation: String | Int = ??? + +val adaptive = AdaptiveRetry.default + +// various configs with custom schedules and default ResultPolicy +adaptive.retry(RetryConfig.immediate(3))(directOperation) +adaptive.retry(RetryConfig.delay(3, 100.millis))(directOperation) +adaptive.retry(RetryConfig.backoff(3, 100.millis))(directOperation) // defaults: maxDelay = 1.minute, jitter = Jitter.None +adaptive.retry(RetryConfig.backoff(3, 100.millis, 5.minutes, Jitter.Equal))(directOperation) + +// result policies +// custom success +adaptive.retry[Int]( + RetryConfig(Schedule.Immediate(3), ResultPolicy.successfulWhen(_ > 0)))(directOperation) +// fail fast on certain errors +adaptive.retry( + RetryConfig(Schedule.Immediate(3), ResultPolicy.retryWhen(_.getMessage != "fatal error")))(directOperation) +adaptive.retryEither( + RetryConfig(Schedule.Immediate(3), ResultPolicy.retryWhen(_ != "fatal error")))(eitherOperation) + +// custom error mode +adaptive.retryWithErrorMode(UnionMode[String])( + RetryConfig(Schedule.Immediate(3), ResultPolicy.retryWhen(_ != "fatal error")))(unionOperation) + +// consider "throttling error" not as a failure that should incur the retry penalty +adaptive.retryWithErrorMode(UnionMode[String])( + RetryConfig(Schedule.Immediate(3), ResultPolicy.retryWhen(_ != "fatal error")), + shouldPayFailureCost = _.fold(_ != "throttling error", _ => true))(unionOperation) +``` diff --git a/generated-doc/out/utils/scheduled.md b/generated-doc/out/utils/scheduled.md index e014fb0e..801cf39b 100644 --- a/generated-doc/out/utils/scheduled.md +++ b/generated-doc/out/utils/scheduled.md @@ -20,13 +20,7 @@ The `scheduled` config consists of: - `Interval` - default for `repeat` operations, where the sleep is calculated as the duration provided by schedule minus the duration of the last operation (can be negative, in which case the next operation occurs immediately). - `Delay` - default for `retry` operations, where the sleep is just the duration provided by schedule. -- `onOperationResult` - a callback function that is invoked after each operation. Used primarily for `onRetry` in `retry` API. - -In addition, it is possible to define strategies for handling the results and errors returned by the `operation`: -- `shouldContinueOnError` - defaults to `_: E => false`, which allows to decide if the scheduler loop should continue - after an error returned by the previous operation. -- `shouldContinueOnSuccess` - defaults to `_: T => true`, which allows to decide if the scheduler loop should continue - after a successful result returned by the previous operation. +- `afterAttempt` - a callback function that is invoked after each operation and determines if the scheduler loop should continue. Used for `onRetry`, `shouldContinueOnError`, `shouldContinueOnResult` and adaptive retries in `retry` API. Defaults to always continuing. ## Schedule