Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Polly v8 #1

Merged
merged 5 commits into from
Sep 28, 2023
Merged

Upgrade to Polly v8 #1

merged 5 commits into from
Sep 28, 2023

Conversation

martincostello
Copy link
Owner

Migrate from Polly v7 policies to Polly v8 resilience strategies.

public ResilienceStrategy GetStrategy(string token, ApiEndpointOption endpoint, string resource)
=> _store.GetStrategy(token, endpoint, resource);

public ResilienceStrategy<TResult> GetStrategy<TResult>(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is executed on hot path and always creates new resilience strategy which is not very efficient.

Is it possible to cache the strategy?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be only doing this for dynamic fallbacks layered on top of the rest of the strategies coming from the store.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could attach ClientExecuteOptions into ResilienceContext and just read it from cached strategies.

ResilienceStrategyRegistry allows adding new manually created strategies at runtime.

@martincostello
Copy link
Owner Author

I'm still working my way through the changes locally to use DI for the strategies. I've got the integration tests working and also swapped over to the partioned rate limiter, but I need to do a bunch of refactoring to the unit tests that verify the behaviour.

I may also have found an issue with reloading not working correctly, but I'll investigate that further once the tests are all green again.

@martincostello
Copy link
Owner Author

The build is broken as the latest push was compiled against the Polly sources locally. I'm going to do further refactoring to tidy things up and then I'll look into the broken test I have.

Once everything seems to be working when built from source locally with the latest code in main, we can look to publish a 8.0.0-alpha.6.

(options, publisher)
=> options.OnTelemetryEvent = (args) => OnTelemetry(publisher, args));

foreach ((var name, var resources) in endpoints)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need to setup all possible combination of resilience strategies right away?

Looking at the code you have this setup:

  • Rate limitier (applied per endpoint, partitioned)
  • Timeout (stateless, per endpoint)
  • Circuit Breaker - applied per each endpoint + resource
  • Retry - stateless

You could introduce public readonly record struct ResourceEndpoint(string BuilderName, string Endpoint, string Resource) complex key and with just a single AddResilienceStrategy call you could setup the whole pipeline dynamically.

Then on the consuming side just provide a key that identifies the resource/endpoint you want to call and the registry can create and cache the resilience strategy for that endpoint + resource automatically.

The only difficulty here is the rate limiter which is applied to a whole endpoint. But you can create it as this:

// add resilience strategy, keyed by EndpointKey that only defines the builder name
services.AddResilienceStrategy(new EndpointKey("endpoint-pipeline", string.Empty, string.Empty, new()), (builder, context) =>
{
    var endpointOptions = context.StrategyKey.EndpointOptions;
    var registry = context.ServiceProvider.GetRequiredService<ResilienceStrategyRegistry<string>>();

    // we want to limit the number of concurrent requests per endpoint and not include the resource.
    // using a registry we can create and cache the shared resilience strategy
    var rateLimiterStrategy = registry.GetOrAddStrategy($"rate-limiter/{context.StrategyKey.EndpointName}", b =>
    {
        b.AddConcurrencyLimiter(new ConcurrencyLimiterOptions { PermitLimit = endpointOptions.MaxParallelization });
    });

    builder.AddStrategy(rateLimiterStrategy);

   // configure more strategies
}

I have also updated the sample here:
App-vNext/Polly#1366

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale was that re-using the common policies in an endpoint for each resource was better than having an identically configured one for each of the N resources. Do you think it's better to have that duplication and have just N endpoints' worth of strategies in the registry instead?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's better to have that duplication and have just N endpoints' worth of strategies in the registry instead?

It would be problem for stateful strategies (RateLimiter, CircuitBreaker) and if you would be recreating them on a hot path or having 10000s of them. All other strategies are very cheap so you can duplicate them without problems.

Also each endpoint might have different configuration so even when having many of them, some of these have different configuration for each endpoint.

@martincostello
Copy link
Owner Author

@martintmk Would you be able to take a look at the failing test with the reloads today? If we can get that sorted then (either by fixing my code or fixing Polly), then I can cut a new alpha and push on further with updating this sample and trying it out again with my internal app.

@martintmk
Copy link

@martintmk Would you be able to take a look at the failing test with the reloads today? If we can get that sorted then (either by fixing my code or fixing Polly), then I can cut a new alpha and push on further with updating this sample and trying it out again with my internal app.

Is there an easy way to build this branch? Locally, there are some issues since it's compiled against API that's not yet published.

@martincostello
Copy link
Owner Author

You'd need to swap out the package references to point at project references for your local clone of the Polly solution.

@martintmk
Copy link

You'd need to swap out the package references to point at project references for your local clone of the Polly solution.

I believe the issue is in your reload logic. This assertion should pass after reload:

image

@martintmk
Copy link

@martincostello

The cause is incorrect config key value:

If you change Api:Endpoints:Movies:CircuitBreaker:Isolate to Api:Endpoints:Movies:Isolate the test succeeds.

@martincostello
Copy link
Owner Author

🤦 - thanks ❤️

@martincostello
Copy link
Owner Author

I've fixed the test and added the assert, but it's still failing for me locally.

I'm just digging around some more now, but it seems like the client gets the updated options but Polly doesn't.

@martincostello
Copy link
Owner Author

Yeah, looks like Polly still has the reference to the original instance of the configuration object, rather than the second one.

If I point a breakpoint at these two locations and do some stuff in the Immediate Windows in Visual Studio you can see the hashcode of the object is still the first one.

services.AddScoped((serviceProvider) => serviceProvider.GetRequiredService<IOptionsMonitor<ApiOptions>>().CurrentValue);

// First iteration
serviceProvider.GetRequiredService<IOptionsMonitor<ApiOptions>>().CurrentValue.Endpoints["Movies"].GetHashCode()
56152722
serviceProvider.GetRequiredService<IOptionsMonitor<ApiOptions>>().CurrentValue.Endpoints["Movies"].Isolate
false
// Second iteration
serviceProvider.GetRequiredService<IOptionsMonitor<ApiOptions>>().CurrentValue.Endpoints["Movies"].GetHashCode()
36181605
serviceProvider.GetRequiredService<IOptionsMonitor<ApiOptions>>().CurrentValue.Endpoints["Movies"].Isolate
true

// First iteration
endpoint.GetHashCode()
56152722
endpoint.Isolate
false
// Second iteration
endpoint.GetHashCode()
56152722
endpoint.Isolate
false

It could still be an issue with the way I've configured things though.

@martincostello
Copy link
Owner Author

I've worked it out now (I think) - the helper registration I have to remove the need to be verbose with IOptionsMonitor<T> is scoped and I think when the reload happens, the scope is different in some way which means the change doesn't propagate. If I change this line of code it works:

- var endpoint = context.ServiceProvider.GetRequiredService<ApiOptions>().GetEndpoint(name);
+ var endpoint = context.ServiceProvider.GetRequiredService<IOptionsMonitor<ApiOptions>>().CurrentValue.GetEndpoint(name);

@martintmk
Copy link

I've worked it out now (I think) - the helper registration I have to remove the need to be verbose with IOptionsMonitor is scoped and I think when the reload happens, the scope is different in some way which means the change doesn't propagate. If I change this line of code it works:

Correct, the scoped lifetime causes the issues. Fyi, you also use the GetOptions convenience method that does the same thing as accessing the options using IOptionsMonitor.

var endpoint = context.GetOptions<ApiOptions>().GetEndpoint(name);

Also sorry, I had this change locally when I was running the test. That's why it was green for me.

@martincostello martincostello force-pushed the polly-v8 branch 2 times, most recently from 9f0dfaf to ecb69e0 Compare July 23, 2023 16:22
@martincostello
Copy link
Owner Author

Needs more investigation, but compared to main with v7 there seems to be a consistent performance regression for the code paths that use the fallback policy in terms of both times and allocations.

main


BenchmarkDotNet v0.13.6, Windows 11 (10.0.22621.1992/22H2/2022Update/SunValley2)
12th Gen Intel Core i7-1270P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 7.0.306
  [Host]     : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2


Method Mean Error StdDev Median Gen0 Allocated
GetMovies 140.5 μs 2.75 μs 3.17 μs 140.1 μs 4.8828 49.05 KB
GetMovie 113.1 μs 2.25 μs 5.78 μs 111.2 μs 3.9063 39.99 KB
GetUsers 133.9 μs 2.43 μs 4.81 μs 133.0 μs 5.3711 53.01 KB
GetUser 107.1 μs 1.99 μs 1.76 μs 107.1 μs 4.3945 41.49 KB

BenchmarkDotNet v0.13.6, Windows 11 (10.0.22621.1992/22H2/2022Update/SunValley2)
12th Gen Intel Core i7-1270P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 7.0.306
  [Host]     : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2


Method Mean Error StdDev Gen0 Gen1 Allocated
GetMovies 18.67 μs 0.230 μs 0.192 μs 1.9226 0.0305 17.71 KB
GetMovie 11.41 μs 0.224 μs 0.483 μs 1.4954 - 14.02 KB

polly-v8


BenchmarkDotNet v0.13.6, Windows 11 (10.0.22621.1992/22H2/2022Update/SunValley2)
12th Gen Intel Core i7-1270P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 7.0.306
  [Host]     : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2


Method Mean Error StdDev Gen0 Allocated
GetMovies 138.8 μs 1.21 μs 1.07 μs 4.8828 44.79 KB
GetMovie 128.8 μs 1.63 μs 1.36 μs 4.3945 44.36 KB
GetUsers 133.2 μs 1.27 μs 1.13 μs 4.8828 48.8 KB
GetUser 121.6 μs 1.96 μs 3.12 μs 4.8828 45.55 KB

BenchmarkDotNet v0.13.6, Windows 11 (10.0.22621.1992/22H2/2022Update/SunValley2)
12th Gen Intel Core i7-1270P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 7.0.306
  [Host]     : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2


Method Mean Error StdDev Gen0 Allocated
GetMovies 15.86 μs 0.129 μs 0.114 μs 1.4343 13.29 KB
GetMovie 15.96 μs 0.315 μs 0.431 μs 1.9531 18.22 KB

@martintmk
Copy link

I think the main problem is the recreation of resilience strategy on the hot-path. You can use the ResilienceStrategyRegistry.GetOrAddStrategy method to add and cache the strategies at runtime.

@martincostello
Copy link
Owner Author

That's true, but the v7 code didn't do that either.

@martincostello
Copy link
Owner Author

Latest benchmark results with the fallback caching.

All the allocations are improved and all the execution times are improved except one, which I think can be attributed to noise as it's ~1.3μs difference.

Polly v7


BenchmarkDotNet v0.13.6, Windows 11 (10.0.22621.1992/22H2/2022Update/SunValley2)
12th Gen Intel Core i7-1270P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 7.0.306
  [Host]     : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2


Method Mean Error StdDev Gen0 Gen1 Allocated
GetMovies 157.2 μs 3.40 μs 9.58 μs 4.8828 - 48.91 KB
GetMovie 127.1 μs 2.50 μs 4.52 μs 3.9063 - 40.04 KB
GetUsers 137.9 μs 4.10 μs 12.03 μs 5.3711 0.4883 53.07 KB
GetUser 102.7 μs 1.72 μs 1.61 μs 4.3945 - 41.36 KB

BenchmarkDotNet v0.13.6, Windows 11 (10.0.22621.1992/22H2/2022Update/SunValley2)
12th Gen Intel Core i7-1270P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 7.0.306
  [Host]     : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2


Method Mean Error StdDev Gen0 Gen1 Allocated
GetMovies 15.175 μs 0.2829 μs 0.3027 μs 1.9226 0.0305 17.75 KB
GetMovie 9.802 μs 0.1080 μs 0.1010 μs 1.5411 0.0153 14.24 KB

Polly v8


BenchmarkDotNet v0.13.6, Windows 11 (10.0.22621.1992/22H2/2022Update/SunValley2)
12th Gen Intel Core i7-1270P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 7.0.306
  [Host]     : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2


Method Mean Error StdDev Median Gen0 Allocated
GetMovies 142.7 μs 2.83 μs 7.90 μs 139.9 μs 4.8828 44.79 KB
GetMovie 120.5 μs 2.34 μs 2.50 μs 120.0 μs 3.4180 35.29 KB
GetUsers 131.6 μs 2.54 μs 2.50 μs 131.2 μs 4.8828 48.46 KB
GetUser 107.2 μs 2.14 μs 1.90 μs 106.8 μs 3.9063 36.29 KB

BenchmarkDotNet v0.13.6, Windows 11 (10.0.22621.1992/22H2/2022Update/SunValley2)
12th Gen Intel Core i7-1270P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 7.0.306
  [Host]     : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2


Method Mean Error StdDev Gen0 Gen1 Allocated
GetMovies 15.17 μs 0.145 μs 0.136 μs 1.4343 0.0153 13.29 KB
GetMovie 11.14 μs 0.222 μs 0.238 μs 0.9918 - 9.14 KB

@martincostello
Copy link
Owner Author

Latest benchmarks with 8.0.0-alpha.7:


BenchmarkDotNet v0.13.6, Windows 11 (10.0.22621.1992/22H2/2022Update/SunValley2)
12th Gen Intel Core i7-1270P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 7.0.306
  [Host]     : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2


Method Mean Error StdDev Gen0 Allocated
GetMovies 136.0 μs 2.60 μs 2.67 μs 4.8828 44.83 KB
GetMovie 116.8 μs 2.31 μs 5.63 μs 3.4180 35.16 KB
GetUsers 139.3 μs 2.72 μs 7.36 μs 4.8828 49.01 KB
GetUser 121.5 μs 2.43 μs 4.95 μs 3.9063 35.97 KB

BenchmarkDotNet v0.13.6, Windows 11 (10.0.22621.1992/22H2/2022Update/SunValley2)
12th Gen Intel Core i7-1270P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 7.0.306
  [Host]     : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2


Method Mean Error StdDev Gen0 Allocated
GetMovies 15.30 μs 0.146 μs 0.137 μs 1.4343 13.33 KB
GetMovie 12.36 μs 0.450 μs 1.306 μs 0.9766 9.01 KB

@martincostello martincostello force-pushed the polly-v8 branch 3 times, most recently from 418876e to 12aa8f1 Compare August 8, 2023 17:46
Update to alpha.9 of Polly v8.
Bump Polly to 8.0.0-beta.1.
Set the instance name.
- Bump Polly to 8.0.0-beta.2.
- Remove `InstanceName` as that was just for testing.
- Update NuGet packages to their latest versions.
Update to the Polly v8 stable release.
@martincostello martincostello marked this pull request as ready for review September 28, 2023 11:19
@martincostello martincostello merged commit 605ecc6 into main Sep 28, 2023
1 check passed
@martincostello martincostello deleted the polly-v8 branch September 28, 2023 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants