Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polly v8 API feedback #1365

Closed
martincostello opened this issue Jun 28, 2023 · 23 comments
Closed

Polly v8 API feedback #1365

martincostello opened this issue Jun 28, 2023 · 23 comments
Labels
v8 Issues related to the new version 8 of the Polly library.
Milestone

Comments

@martincostello
Copy link
Member

I've spent a few hours today upgrading an internal application which uses Polly from the v7 policies to v8 resilience strategies and encountered a number of rough edges and/or difficulties. This issue catalogs those at a high-level for discussion as to whether or not these are issues or missing functionality that we would wish to fix/add as part of the v8 release. I can add more concrete details/repro if needed for the specific items.

  1. ResilienceProperties has a dictionary indexer for reading and writing properties, but it is an explicit interface definition so requires casting to a dictionary to use. It would be much more convenient to make it implicit so you can do things like arguments.Properties["foo"].
  2. It isn't possible to share a CircuitBreakerManualControl across multiple strategies - we have a use case where we manually isolate circuit breakers and it isn't possible to have a single instance available to any circuit breaker strategy to observe (thing similarly to sharing a CancellationToken on multiple operations). I've tried pooling them and iterating through the set when I want to open/close them, but this doesn't seem to work with what I've tried so far.
  3. In the above case, if I try and manually isolate the manual control but there doesn't happen to have been a strategy created that uses the builder yet, it throws an exception saying it hasn't been initialized.
  4. Creating a strategy of policies composed of other potentially existing policies seems to be a bit cumbersome. A rough example of what I'm trying to do is below, and needing to create a new builder to get a policy (because I can't make them directly because of internal constructors) is a bit weird.
// This is just illustrative, it doesn't compile as-is

private readonly ConcurrentDictionary<string, ResilienceStrategy> _registry = new();
private readonly ConcurrentDictionary<string, CircuitBreakerManualControl> _isolators = new();

public ResilienceStrategy GetStrategy(string token, ApiEndpointOption endpoint, string resource)
{
    var builder = new ResilienceStrategyBuilder();

    ApplyRetries(builder, endpoint);
    ApplyCircuitBreaker(builder, endpoint, resource);
    ApplyTimeout(builder, endpoint);

    return builder.Build();
}

private void ApplyRetries(ResilienceStrategyBuilder builder, ApiEndpointOption endpoint)
{
    if (endpoint.Retries > 0)
    {
        var strategy = _registry.GetOrAdd($"{endpoint.Name}-Retry", (key) => new ResilienceStrategyBuilder().AddRetry(new()
        {
            BackoffType = RetryBackoffType.ExponentialWithJitter,
            BaseDelay = endpoint.RetryDelaySeed,
            OnRetry = OnRetry,
            RetryCount = endpoint.Retries,
            ShouldHandle = new PredicateBuilder().Handle<ApiException>(CanRetry).Handle<TaskCanceledException>(),
            StrategyName = key,
        }).Build());

        builder.AddStrategy(strategy);
    }
}

private void ApplyCircuitBreaker(
    ResilienceStrategyBuilder builder,
    ApiEndpointOption endpoint,
    string resource)
{
    var manualControl = _isolators.GetOrAdd(endpoint.Name, (_) => new());

    var strategy = _registry.GetOrAdd($"{endpoint.Name}-{resource}-CircuitBreaker", (key) => new ResilienceStrategyBuilder().AddAdvancedCircuitBreaker(new()
    {
        BreakDuration = endpoint.FailureBreakDuration,
        FailureThreshold = endpoint.FailureThreshold,
        MinimumThroughput = endpoint.FailureMinimumThroughput,
        SamplingDuration = endpoint.FailureSamplingDuration,
        OnOpened = OnCircuitOpened,
        OnClosed = OnCircuitClosed,
        StrategyName = key,
        ManualControl = manualControl,
        ShouldHandle = new PredicateBuilder()
            .Handle<ApiException>(CanCircuitBreak)
            .HandleHttpRequestFault()
            .Handle<OperationCanceledException>()
            .Handle<TimeoutRejectedException>(),
    }).Build());

    if (endpoint.Isolate)
    {
        // TODO sync over async
        manualControl.IsolateAsync().GetAwaiter().GetResult();
    }

    builder.AddStrategy(strategy);
}

private void ApplyTimeout(ResilienceStrategyBuilder builder, ApiEndpointOption endpoint)
{
    var strategy = _registry.GetOrAdd($"{endpoint.Name}-Timeout", (key) => new ResilienceStrategyBuilder().AddTimeout(new TimeoutStrategyOptions()
    {
        OnTimeout = OnTimeout,
        StrategyName = key,
        Timeout = endpoint.Timeout.Add(TimeSpan.FromSeconds(1)),
    }).Build());

    builder.AddStrategy(strategy);
}

It might be that in some cases I'm just using the new API wrong, rather than their being issues. There might also be cases where I need to re-approach how Polly is used in a more fundamental way - for example Policies are currently created on-demand, which makes it non-trivial to put them in DI. Similarly, the pay configuration changes are handled to recreate the cached policies in a policy registry would need to be reworked to use the new reloading mechanism if using the DI integration as well.

I'll do some more work on this tomorrow application with v8 and try and get the application's tests passing again - at the moment I've got issues with circuit breakers as well as retries (not shown above) not working properly.

/cc @martintmk

@martincostello martincostello added the v8 Issues related to the new version 8 of the Polly library. label Jun 28, 2023
@martincostello martincostello added this to the v8.0.0 milestone Jun 28, 2023
@martintmk
Copy link
Contributor

@martincostello, great feedback, let's try to address some of these :)

ResilienceProperties has a dictionary indexer for reading and writing properties, but it is an explicit interface definition so requires casting to a dictionary to use. It would be much more convenient to make it implicit so you can do things like arguments.Properties["foo"].

This was intentional, so we can support strongly-typed access to values. However, we can also expose it with implicit conversion. Is there any reason you are not using the ResiliencePropertyKey<T>

It isn't possible to share a CircuitBreakerManualControl across multiple strategies - we have a use case where we manually isolate circuit breakers and it isn't possible to have a single instance available to any circuit breaker strategy to observe (thing similarly to sharing a CancellationToken on multiple operations). I've tried pooling them and iterating through the set when I want to open/close them, but this doesn't seem to work with what I've tried so far.

I think we can expand the functionality to allow sharing CircuitBreakerManualControl between multiple resilience strategies. Shouldn't be much of a problem.

Creating a strategy of policies composed of other potentially existing policies seems to be a bit cumbersome. A rough example of what I'm trying to do is below, and needing to create a new builder to get a policy (because I can't make them directly because of internal constructors) is a bit weird.

I'll try to rewrite this in a Polly V8 way tomorrow. I believe we have a support for this scenario by using the builders and dynamic strategy creation. But let me confirm that tomorrow.

@martincostello
Copy link
Member Author

martincostello commented Jun 28, 2023

Is there any reason you are not using the ResiliencePropertyKey

Basically laziness - I was expecting to be able to just quickly use the indexer on a string to get a string back (I was replacing some code that was using the operating key). Not shown in the code above I just made myself an extension method to do it via the Get and Set methods. It just seemed a lot more verbose for that use case.

internal static class PollyExtensions
{
    private static readonly ResiliencePropertyKey<string> OperationKeyKey = new("OperationKey");

    internal static string GetOperationKey(this ResilienceContext context)
    {
        if (!context.Properties.TryGetValue(OperationKeyKey, out string operationKey))
        {
            operationKey = "Unknown";
        }

        return operationKey;
    }

    internal static void SetOperationKey(this ResilienceContext context, string operationKey)
        => context.Properties.Set(OperationKeyKey, operationKey);
}

@martintmk
Copy link
Contributor

It just seemed a lot more verbose for that use case.

How about we allow:

resilienceContext.Properties.Get<string>("my-key");

It will be just convenience method that creates ResiliencePropertyKey<string>("my-key") inline.

Would that simplify your use-case?

@martincostello
Copy link
Member Author

Yeah I think that would be a good compromise - I think in my case it just seemed extra complicated compared to v7 because I was replacing a string-string pair in the first place, rather than some other type(s).

Essentially the On* callbacks in the code were originally using the OperationContext in a lot of places, so I had to hydrate the ResilienceContext with it and then read it back out in a tonne of places for logging for the different callbacks (retried, timed out etc.).

@martintmk
Copy link
Contributor

@martincostello I tried to rewrite your example using the Polly V8 way in #1366.

Please let me know if that works for you.

@martintmk
Copy link
Contributor

@martincostello

Regarding this code:

internal static class PollyExtensions
{
    private static readonly ResiliencePropertyKey<string> OperationKeyKey = new("OperationKey");

    internal static string GetOperationKey(this ResilienceContext context)
    {
        if (!context.Properties.TryGetValue(OperationKeyKey, out string operationKey))
        {
            operationKey = "Unknown";
        }

        return operationKey;
    }

    internal static void SetOperationKey(this ResilienceContext context, string operationKey)
        => context.Properties.Set(OperationKeyKey, operationKey);
}

Regarding OperationKey, how are you currently using this information? I see that it's part of the V7 Context.OperationKey.

We might introduce ResilienceContext.OperationKey property if it will add value for Polly V7 users.

@martincostello
Copy link
Member Author

We use it in logging. For example with the v7 code:

[LoggerMessage(8, LogLevel.Information, "{PolicyWrapKey}:{PolicyKey} at {OperationKey} execution timed out after {Timeout} with exception.")]
public static partial void TimeoutWithException(ILogger logger, Exception exception, string policyWrapKey, string policyKey, string operationKey, TimeSpan timeout);

I'm fine with losing the wrap keys and the type of policy can be inferred from the context, but we'd still want to know which endpoint's policy is firing without having to read the stack trace to work it out.

@martintmk
Copy link
Contributor

What are the examples of the values you are assigning to it?

Does it have high cardinality? (for example you are generating guids for it)

Basically, I am asking if we can include it in the metrics by default.

@martincostello
Copy link
Member Author

They're values similar to this: "applicationname.GetNoun"

This particular application has dependencies on 4 other applications it calls over HTTP, and the maximum number of distinct HTTP operations exposed by any one them is 5.

For example:

"ApplicationName Composite Fallback":"ApplicationName Retry" at "ApplicationName.GetByOrderIdAsync" has caused a retry after attempt 1. Backing-off for 00:00:00.2164893.

@martintmk
Copy link
Contributor

martintmk commented Jul 3, 2023

Ok, I think it makes sense to add it to Polly V8 as it can add valuable information to the telemetry.

I'll make a note that the property should have low cardinality.

edit:

"ApplicationName Composite Fallback":"ApplicationName Retry" at "ApplicationName.GetByOrderIdAsync" has caused a retry after attempt 1. Backing-off for 00:00:00.2164893.

I don't think logs as the one above would be required after this change.

@martincostello
Copy link
Member Author

martincostello commented Jul 3, 2023

I've created a sandbox app and the relevant tests based on our internal application here: https://github.com/martincostello/polly-sandbox

It illustrates how Polly v7 is integrated today into our HTTP client request processing.

I'll create a branch based on the v8 API from my internal branch later this afternoon that makes the same changes I made that lead to this issue including the latest changes from alpha.5 and see how far I get. Once that's done there'll be something more concrete to base a migration path around.

@martintmk
Copy link
Contributor

It illustrates how Polly v7 is integrated today into our HTTP client request processing.

This is great! We can also use it as a case-study on how the migration from V7 to V8 might look like.

@martintmk
Copy link
Contributor

martintmk commented Jul 3, 2023

Looking at your sample I think we need to add support for PartitionedRateLimiter. In your sample you are extracting the rate-limiter token:
https://github.com/martincostello/polly-sandbox/blob/952dff8edfe607255ff6486b4036b2cb3517c9a6/src/PollySandbox/Clients/ApiClient.cs#L84

And using it to cache the rate limiter strategy. Rather than hacking around caching the rate limiter strategies, we can use the built-in PartitionedRateLimiter that does exactly the same thing.

To add this support to Polly we can just introduce the ResilienceRateLimiter that unifies both RateLimiter and PartitionedRateLimiter:

public sealed class ResilienceRateLimiter
{
    /// <summary>
    /// Creates an instance of <see cref="ResilienceRateLimiter"/> from <paramref name="rateLimiter"/>.
    /// </summary>
    /// <param name="rateLimiter">The rate limiter instance.</param>
    /// <returns>An instance of <see cref="ResilienceRateLimiter"/>.</returns>
    public static ResilienceRateLimiter Create(RateLimiter rateLimiter) => new(Guard.NotNull(rateLimiter), null);

    /// <summary>
    /// Creates an instance of <see cref="ResilienceRateLimiter"/> from partitioned <paramref name="rateLimiter"/>.
    /// </summary>
    /// <param name="rateLimiter">The rate limiter instance.</param>
    /// <returns>An instance of <see cref="ResilienceRateLimiter"/>.</returns>
    public static ResilienceRateLimiter Create(PartitionedRateLimiter<ResilienceContext> rateLimiter) => new(null, Guard.NotNull(rateLimiter));
}

And use it in RateLimiterStrategyOptions:

public class RateLimiterStrategyOptions : ResilienceStrategyOptions
{
    // other properties

    public ResilienceRateLimiter? RateLimiter { get; set; }
}

This way we could preserve having only a single options and still have the possibility to expand the ResilienceRateLimiter API.

@martincostello Shall I give this a go?

edit: updated the API to reuse the RateLimiterStrategyOptions and not introduce new ones.

@martincostello
Copy link
Member Author

Sure that makes sense to me 👍

@martincostello
Copy link
Member Author

The branch using 8.0.0-alpha.5 is here: https://github.com/martincostello/polly-sandbox/tree/polly-v8

The tests are now all passing, but it's mostly a "make it work" conversion using resilience strategies - I haven't attempted to properly v8-ify the code yet, like using the built-in reload support.

A few further observations now it's working.

  1. In v7 we used Policy.WrapAsync() like this: return Policy.WrapAsync(retry, breaker, timeout, bulkhead, rateLimit). When I first converted the code I didn't spot that I was adding the strategies in the same order as the old code, which mean they were actually in the reverse order in the pipeline than they should have been. This feels like something subtle we need to make clear in the migration documentation as wrap goes inner -> middle -> outer but the builder goes outer -> middle -> inner.
  2. Our existing system uses a constant backoff with jitter, rather than an exponential one. I've re-implemented this using RetryDelayGenerator but is it worth making it built-in?

@martincostello
Copy link
Member Author

I've applied the changes and fixes back to the original internal application and that's all building with passing tests now too (again, not v8-ified yet).

@martintmk
Copy link
Contributor

In v7 we used Policy.WrapAsync() like this: return Policy.WrapAsync(retry, breaker, timeout, bulkhead, rateLimit). When I first converted the code I didn't spot that I was adding the strategies in the same order as the old code, which mean they were actually in the reverse order in the pipeline than they should have been. This feels like something subtle we need to make clear in the migration documentation as wrap goes inner -> middle -> outer but the builder goes outer -> middle -> inner.

True, we should note this. The flow is now similar to IHttpClientFactory as strategies are executed in the same order as they were added.

Our existing system uses a constant backoff with jitter, rather than an exponential one. I've re-implemented this using RetryDelayGenerator but is it worth making it built-in?

I would say yes. If the jitter is supported for Exponential backoff type I would say it should be allowed for Constant and Linear backoff too. Can you add a comment to API review?

We can either expand the enum with new members, or simplify it and introduce RetryStrategyOptions.JitterEnabled bool property.

@martintmk
Copy link
Contributor

I've applied the changes and fixes back to the original internal application and that's all building with passing tests now too (again, not v8-ified yet).

Can you create a PR so I can add comments to it for reference?

@martincostello
Copy link
Member Author

Can you create a PR so I can add comments to it for reference?

martincostello/polly-sandbox#1

@martincostello
Copy link
Member Author

Can you add a comment to API review?

#1233 (review)

@martincostello
Copy link
Member Author

martincostello commented Jul 4, 2023

I've pushed the latest changes to my sandbox app. I'll likely do some more work on it on Friday.

Based on the current state, there's a few more pieces of feedback.

  1. There's a scenario (the skipped Circuit_Breaker_Can_Be_Isolated_For_An_Endpoint test) in the sandbox app that isn't working properly. I don't know if that's a bug in Polly or the sample yet.
  2. When I do manually isolate a circuit ahead of time (because config says to) I need to do this call because configuring the service collection isn't async: manualControl.IsolateAsync().GetAwaiter().GetResult(). This made me wonder if we could simplify this scenario by adding a constructor that allows you to set whether it's open when you create it? Then I could just do something like var manualControl = new CircuitBreakerManualControl(open: true) instead which is much neater.

@martintmk
Copy link
Contributor

When I do manually isolate a circuit ahead of time (because config says to) I need to do this call because configuring the service collection isn't async: manualControl.IsolateAsync().GetAwaiter().GetResult(). This made me wonder if we could simplify this scenario by adding a constructor that allows you to set whether it's open when you create it? Then I could just do something like var manualControl = new CircuitBreakerManualControl(open: true) instead which is much neater.

Can you add a comment to the API review? We can discuss this addition there.

@martincostello
Copy link
Member Author

I think we've tackled everything I raised here as of alpha.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v8 Issues related to the new version 8 of the Polly library.
Projects
No open projects
Status: Done
Development

No branches or pull requests

2 participants