Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anti-patterns for Retry in V7 #1527

Closed
peter-csala opened this issue Aug 30, 2023 · 6 comments
Closed

Anti-patterns for Retry in V7 #1527

peter-csala opened this issue Aug 30, 2023 · 6 comments

Comments

@peter-csala
Copy link
Contributor

peter-csala commented Aug 30, 2023

As it was discussed in this PR I've tried to collect those anti-patterns what I have seen in past 3+ years on StackOverflow.

In this issue I try to focus only on Retry. Later I will file new issues for Fallback, Circuit Breaker, etc..

1 - Using retry as a periodical executor

The retry policy can be defined in a way that it runs forever in a given frequency for example once in a day

❌ DON'T

var retryForeverDaily = Policy
    .HandleResult<bool>(_ => true)
    .WaitAndRetryForeverAsync(_ => TimeSpan.FromHours(24));

Reasoning:

  • The sleep period can be blocking or non blocking depending on how your policy is defined: synchronous or asynchronous respectively
  • Even if it is used in non blocking manner, it consumes (unnecessarily) memory which can't be garbage collected

✅ DO
Use appropriate tool to schedule recurring jobs like Quartz.Net or Hangfire

Reasoning:

  • Polly was never design to support this use case rather than its main aim is to help you overcome short transient failures
  • Dedicated job scheduler tools are more efficient (in terms of memory) and can be configured to withstand machine failure by utilizing persistence storage

2 - Combining multiple sleep duration strategies

The retry has several overloads where you can dynamically define the sleep durations

❌ DON'T
Mixing the ever an increasing values with constant ones

.WaitAndRetry(10, retryAttempt => retryAttempt switch
{
    <= 5 => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)), 
    _ => TimeSpan.FromMinutes(3)
});

Reasoning:

  • By changing the behaviour based on state we basically created here a state machine
  • Even though it is a really compact/concise way to express sleep durations there are three main drawbacks
    • This approach does not embrace re-usability (you can't re-use only the quick retries)
    • The sleep duration logic is tightly coupled to the retryCount parameter of the policy
    • It is harder to unit test

✅ DO
Define two separate policies and then chain them

var quickRetries = ...
  .WaitAndRetry(Backoff.DecorrelatedJitterBackoffV2(TimeSpan.FromSeconds(1), 5));
var slowRetries = ...
  .WaitAndRetry(5, TimeSpan.FromMinutes(3));
  
var retries = Policy.Wrap(slowRetries, quickRetries); 

Reasoning:

  • Even though this approach is a bit more verbose (compared to the previous one) it is more flexible
  • You can compose the retry policies in any order (slower is the outer and quicker is the inner or vice versa)
  • You can define different triggers for the retry policies
    • Which allows you to switch back and forth between the policies based on the thrown exception or on the result
    • There is no strict order (quick retries comes first no matter what...) so, quick and slow retries can interleave

3 - Pre-generating massive amount of sleep durations

WaitAndRetry can can accept an IEnumerable<TimeSpan> for sleep durations

❌ DON'T
Creating a List with huge amount of timespans

private IEnumerable<TimeSpan> GetSleeps()
{
    var sleeps = new List<TimeSpan>();

    for (int i = 0; i < 1_000; i++)
    {
        sleeps.Add(TimeSpan.FromSeconds(300 + (i % 200)));
    }

    return sleeps;
}

var retry = ...
   .WaitAndRetry(sleepDurations: GetSleeps());

Reasoning:

  • It uses too much memory unnecessarily for fairly long period
  • For each and every sleep duration only a single element is consumed the rest 999 remains intact

✅ DO
Use yield return and sleepDurationProvider

private IEnumerable<TimeSpan> GetSleeps(int _)
{
    for (int i = 0; i < 1_000; i++)
    {
        yield return TimeSpan.FromSeconds(300 + (i % 200));
    }
}

var retry = ...
   .WaitAndRetry(sleepDurationProvider: GetSleeps);

Reasoning:

  • This approach will calculate the very next sleep duration whenever it needed
  • In other words this is the lazy approach the other one is the eager

4 - Branching retry logic based on request url

Lets suppose you have an HttpClient and you want to decorate it with a retry only if a request is against a certain endpoint

❌ DON'T
Using NoOp and ?: operator

.AddPolicyHandler((sp, req) =>
    shouldRetry(req.RequestUri)
        ? Policy<HttpResponseMessage>.Handle<HttpRequestException>()...
        : Policy.NoOpAsync<HttpResponseMessage>()

Reasoning:

  • In this case the triggering conditions/logic are scattered in multiple places
  • From future extensibility standpoint is also not desirable since it can easily become less legible as you add more and more conditions

✅ DO
Use the Handle / Or clauses to prevent triggering retry

.AddPolicyHandler((sp, req) =>
     Policy<HttpResponseMessage>
       .Handle<HttpRequestException>(_ => shouldRetry(req.RequestUri))
       ...

Reasoning:

  • The triggering conditions are located in a single, well-known place
  • There is no need to cover "what to do when policy shouldn't trigger"

5 - Calling a given method before/after each retry attempt

The onRetry/onRetryAsync user-defined delegates are called before the sleep

❌ DON'T
Calling explicitly the given method before Execute/ExecuteAsync

var retryPolicy = ...
    .WaitAndRetry(.., (exception, timeSpan) => BeforeEachRetry);
...
BeforeEachRetry();
var result = retryPolicy.Excecute(DoSomething);

Reasoning:

  • Since the onRetry is called before each retry attempt it won't be called before the very first initial attempt (which is not a retry)
  • If the policy is used in multiple places it is quite likely that you will forgot to call BeforeEachRetry before every Execute calls
  • Here the naming is very explicit but in real world scenario it might not prefixed with Before so, one might call it after the Execute call which is not the intended behaviour

✅ DO
Decorate the method call pair together

var retryPolicy = ...
    .WaitAndRetry(..., (exception, timeSpan) => {});
...
var result = retryPolicy.Excecute(() => 
{ 
   BeforeEachRetry();
   DoSomething();
});

Reasoning:

  • If the DoSomething and BeforeEachRetry coupled together then decorate them together
    • Or create a simple wrapper to call them in the desired order

6 - Having a single policy to cover multiple failures

Lets suppose we have an HttpClient which issues a request and then we try to parse a large Json

❌ DON'T
Having a single policy to cover everything

var retry = Policy
    .Handle<HttpRequestException>()
    .WaitAndRetryAsync(...);

var timeout = Policy
    .TimeoutAsync(timeout: TimeSpan.FromMinutes(1));

var strategy = Policy.WrapAsync(retry, timeout);

await strategy.ExecuteAsync(async (ct) =>
{
     var stream = await httpClient.GetStreamAsync(endpoint, ct);
     var foo = await JsonSerializer.DeserializeAsync<Foo>(stream, cancellationToken: ct);
     ...
}, CancellationToken.None);

Reasoning:

  • In the previous point it was suggested that if the X and Y coupled together then decorate them together
    • if they are all part of the same failure domain
    • in other words the decorator policy covers one group of transient failures
      • this is true for the combined policy as well

✅ DO
Define a policy per failure "mode"

var retry = Policy<Stream>
    .Handle<HttpRequestException>()
    .WaitAndRetryAsync(...);

var stream = await retry.ExecuteAsync((ct) => httpClient.GetStreamAsync(endpoint, ct), CancellationToken.None);

var timeout = Policy<Foo>
    .TimeoutAsync(timeout: TimeSpan.FromMinutes(1));

var foo = await timeout.ExecuteAsync((ct) => JsonSerializer.DeserializeAsync<Foo>(stream, cancellationToken: ct), CancellationToken.None);

Reasoning:

  • Network call's failure domain is different than deserialization's failures
    • Having dedicated policies makes the overall resilient strategy more robust against different transient failures

7 - Cancelling retry in case of given exception

After you receive a TimeoutException you don't want to perform any more retries

❌ DON'T
Adding cancellation logic inside onRetryAsync

...
.WaitAndRetryAsync(
        ...
        onRetryAsync:(ex, _, __, ___) =>
    {
        if (ex is TimeoutException)
        {
            var cts = context["shouldNotRetryCTS"] as CancellationTokenSource;
            cts.Cancel();
        }
        ...
    });

Reasoning:

  • The triggering logic/conditions should be placed inside Handle/Or/HandleResult/OrResult builder methods
  • "Jumping out from the policy" from a user-defined delegate either via an Exception or by a CancellationToken just complicates the policy (and its control flow) unnecessarily

✅ DO
Define triggering logic inside Handle

...
.Handle<Exception>(ex => ex is not TimeoutException))
.WaitAndRetryAsync(...);

Reasoning:

  • As it was stated above please use the dedicated builder methods to define triggering condition
    • Try to rephrase your exit condition in a way to express when should a given policy trigger
@martincostello
Copy link
Member

This looks really great!

@martintmk
Copy link
Contributor

@peter-csala

Do you have any ideas on how to approach this from the perspective of V8?

@peter-csala
Copy link
Contributor Author

@martintmk

IMHO the root cause of these anti-patterns (in most of the cases) is that the APIs are flexible enough to achieve same behavior with different chain of method calls.

  • The don't cases can be considered as naive or "creative" implementations.
  • Whereas the do cases are the more "polly-native" approaches.

So, I think we could iterate through the patterns one-by-one and assess them by answering to the following questions:

  • Can I use the V8 API in the similar fashion as it was described in the don't section?
    • Is the approach considered as an abuse of the API(s) or as the new suggested way?
  • What would be recommended solution by using V8?

What do you think?

@martintmk
Copy link
Contributor

Based on the description only the number 3 is not possible with V8. The remaining ones are all doable with V8.

@peter-csala
Copy link
Contributor Author

Cool, one less anti-pattern :D

Would you convert the V7 code snippets to V8?

@martintmk
Copy link
Contributor

var quickRetries = ...
  .WaitAndRetry(Backoff.DecorrelatedJitterBackoffV2(TimeSpan.FromSeconds(1), 5));
var slowRetries = ...
  .WaitAndRetry(5, TimeSpan.FromMinutes(3));
  
var retries = Policy.Wrap(slowRetries, quickRetries); 

I can give it a shot. It should be pretty straightforward. The main difference is that in V8 the strategies (policies) are executed in the same order as they were added.

For example, this:

var quickRetries = ...
  .WaitAndRetry(Backoff.DecorrelatedJitterBackoffV2(TimeSpan.FromSeconds(1), 5));
var slowRetries = ...
  .WaitAndRetry(5, TimeSpan.FromMinutes(3));
  
var retries = Policy.Wrap(slowRetries, quickRetries); 

Now becomes:

var pipeline = new ResiliencePipelineBuilder()
    .AddRetry(new() // slow retries
    {
        MaxRetryAttempts = 5,
        Delay = System.TimeSpan.FromMinutes(3),
        BackoffType = DelayBackoffType.Constant
    })
    .AddRetry(new() // fast retries
    {
        MaxRetryAttempts = 5,
        Delay = System.TimeSpan.FromSeconds(1),
        UseJitter = true,
        BackoffType = DelayBackoffType.Exponential
    })
    .Build();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants