Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hedged Requests are not fired after connection timeouts #4654

Closed
sandrohanea opened this issue Nov 2, 2023 · 4 comments · Fixed by #5134
Closed

Hedged Requests are not fired after connection timeouts #4654

sandrohanea opened this issue Nov 2, 2023 · 4 comments · Fixed by #5134
Assignees
Labels
area-resilience bug This issue describes a behavior which is not expected - a bug. work in progress 🚧
Milestone

Comments

@sandrohanea
Copy link

Description

If a connection establishment is failing with a timeout, configured on the SocketsHttpHandler, hedged requests are not generated and the whole pipeline fails.

Reproduction Steps

PreReq:

  • Have some standard hedging handler with 2 endpoint groups: [ "https://localhost:3000", "https://microsoft.com" ]
  • Have total request timeout and endpoint timeout long enough to not be reached (e.g. 100 seconds)
  • Have a short hedging delay.
  • Configure a SocketsHttpHandler with ConnectTimeout short enough to be reached for the first endpoint group (e.g. 1 second)
  • Fire a request with the newly created HttpClient

Expected behavior

The expected behavior is that "https://microsoft.com" endpoint is called and the pipeline succeeds.

Actual behavior

It is not actually called and a TaskCanceledException is thrown (coming from the HttpConnectionPool)

Task Canceled Exception
System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
 ---> System.TimeoutException: The operation was canceled.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.TimeoutException: A connection could not be established within the configured ConnectTimeout.
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnectionPool.CreateConnectTimeoutException(OperationCanceledException oce)
   at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(HttpRequestMessage request)

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

@martintmk
Copy link
Contributor

This code reproduces the issue:

using System;
using System.Net.Http;
using System.Net.Security;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Http.Resilience;

var services = new ServiceCollection();
var clientBuilder = services
    .AddHttpClient("ClientR9")
    .ConfigurePrimaryHttpMessageHandler(() => new SocketsHttpHandler
    {
        UseProxy = false,
        Proxy = null,
        ConnectTimeout = TimeSpan.FromSeconds(1),
        SslOptions = new SslClientAuthenticationOptions
        {
            ClientCertificates = [],
            RemoteCertificateValidationCallback = (_, _, _, _) => true
        },
    })
    .AddStandardHedgingHandler(routing =>
    {
        routing.ConfigureOrderedGroups(g =>
        {
            g.Groups.Add(new UriEndpointGroup
            {
                Endpoints = new[] { new WeightedUriEndpoint { Uri = new Uri("https://localhost:3000") } }
            });

            g.Groups.Add(new UriEndpointGroup
            {
                Endpoints = new[] { new WeightedUriEndpoint { Uri = new Uri("https://microsoft.com") } }
            });
        });

    })
    .Configure(opt =>
    {
        opt.Hedging.MaxHedgedAttempts = 10;
        opt.Hedging.Delay = TimeSpan.FromSeconds(11);
        opt.Endpoint.CircuitBreaker.FailureRatio = 0.99;
        opt.Endpoint.CircuitBreaker.SamplingDuration = TimeSpan.FromSeconds(900);
        opt.TotalRequestTimeout.Timeout = TimeSpan.FromSeconds(200);
        opt.Endpoint.Timeout.Timeout = TimeSpan.FromSeconds(200);
    });

var provider = services.BuildServiceProvider();
var clientFactory = provider.GetRequiredService<IHttpClientFactory>();
var client = clientFactory.CreateClient("ClientR9");

try
{
    var response = await client.GetAsync("https://localhost:3000");
    Console.WriteLine(response.StatusCode);
}
catch (OperationCanceledException ex)
{
    Console.WriteLine("Should never reach here but it does after 1 second");
}

The issue is that OperationCanceledException thrown from SocketsHttpHandler is not threated as transient. The challange is to detect this particular case and not treat other OperationCanceledException as transient.

@sandrohanea
Copy link
Author

This is where the exception is thrown and unfortunately there is no consistent way in identifying if it was a task cancelled from the connection timeout or task cancelled from a request timeout, in which case we probably don't want to retry / execute hedged requests:

https://github.com/dotnet/runtime/blob/deb43368fde438b3e6928d3930f1362a4960fdf4/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs#L475

@martintmk
Copy link
Contributor

How about checking against ResilienceContext.CancellationToken? If ResilienceContext.CancellationToken is not cancelled and we receive TaskCanceledException with that layout we know that this cancellation came from SocketsHttpHandler when establishing connection.

@sandrohanea
Copy link
Author

I think that might work, what about the CancellationToken that is provided to the SendAsync method?
Is that used initially to create a linked CancellationToken for ResilienceContext.CancellationToken ?

@geeknoid geeknoid added area-resilience bug This issue describes a behavior which is not expected - a bug. and removed untriaged labels Nov 22, 2023
@geeknoid geeknoid added this to the 8.1 milestone Nov 27, 2023
@joperezr joperezr modified the milestones: 8.1, 8.2 Jan 22, 2024
@iliar-turdushev iliar-turdushev pinned this issue Apr 25, 2024
@iliar-turdushev iliar-turdushev unpinned this issue Apr 25, 2024
@xakep139 xakep139 self-assigned this Apr 30, 2024
@xakep139 xakep139 modified the milestones: 8.2, 8.4 Apr 30, 2024
@xakep139 xakep139 modified the milestones: 8.4, 8.5 May 2, 2024
@dotnet-policy-service dotnet-policy-service bot removed this from the 8.5 milestone May 9, 2024
@xakep139 xakep139 modified the milestones: 8.5, 8.6 May 9, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jun 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-resilience bug This issue describes a behavior which is not expected - a bug. work in progress 🚧
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants