Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs filled with critical exceptions from Kestrel/Quic #45105

Open
1 task done
mrgleba opened this issue Nov 15, 2022 · 31 comments
Open
1 task done

Logs filled with critical exceptions from Kestrel/Quic #45105

mrgleba opened this issue Nov 15, 2022 · 31 comments
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions HTTP3 investigate

Comments

@mrgleba
Copy link

mrgleba commented Nov 15, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

I run an app in a linux docker container (tag 7.0) with http/3 enabled.
The libmsquic version in 2.1.4 (from ms repo).
The server logs are being filled with the following message:

Microsoft.AspNetCore.Server.Kestrel Critical Unexpected exception in HttpConnection.ProcessRequestsAsync. 
An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER
  at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicStreamContext.Abort(ConnectionAbortedException abortReason)
  at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3FrameWriter.Abort(ConnectionAbortedException error)
  at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3ControlStream.Abort(ConnectionAbortedException abortReason, Http3ErrorCode errorCode)
  at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
  at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication`1 httpApplication)

This doesn't influence the operations as far as we see. The server responds normally despite the messages.

Expected Behavior

No messages in logs

Steps To Reproduce

No response

Exceptions (if any)

No response

.NET Version

7.0 runtime (no sdk)

Anything else?

No response

@adityamandaleeka
Copy link
Member

@mrgleba Do you see this on any app or is there anything interesting about your code?

@adityamandaleeka
Copy link
Member

@karelz Is there any guidance on which libmsquic versions are compatible with .NET 7 on Linux?

We have a note in the docs for .NET 6 here: https://learn.microsoft.com/en-us/aspnet/core/fundamentals/servers/kestrel/http3?view=aspnetcore-7.0#linux

@mrgleba
Copy link
Author

mrgleba commented Nov 17, 2022

@adityamandaleeka Nothing I can point out. Just upgraded to 7.0 and enabled http/3.

@adityamandaleeka adityamandaleeka added this to the 8.0-preview1 milestone Nov 18, 2022
@adityamandaleeka
Copy link
Member

@mrgleba Just out of curiosity, can you try the 2.1.1 version of msquic and see if you have the same issue?

cc: @nibanks

@mrgleba
Copy link
Author

mrgleba commented Nov 21, 2022

I can confirm it's the same with 2.1.1

@adityamandaleeka
Copy link
Member

Thanks @mrgleba

@ezar
Copy link

ezar commented Dec 7, 2022

Same error using Windows 2022

@ladeak
Copy link
Contributor

ladeak commented Feb 16, 2023

I can find the very same message on Windows. It appears sometimes after the client has completed the request and the client process has completed.

crit: Microsoft.AspNetCore.Server.Kestrel[0]
      Unexpected exception in HttpConnection.ProcessRequestsAsync.
      System.Net.Quic.QuicException: An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER
         at System.Net.Quic.ThrowHelper.ThrowIfMsQuicError(Int32 status, String message)
         at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicStreamContext.Abort(ConnectionAbortedException abortReason)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3FrameWriter.Abort(ConnectionAbortedException error)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3ControlStream.Abort(ConnectionAbortedException abortReason, Http3ErrorCode errorCode)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication`1 httpApplication)
dotnet --info
.NET SDK:
 Version:   7.0.200-preview.22628.1
 Commit:    76bbaffad6

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.22621
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\7.0.200-preview.22628.1\

Host:
  Version:      7.0.0
  Architecture: x64
  Commit:       d099f075e4

@mkArtakMSFT mkArtakMSFT removed this from the 8.0-preview2 milestone Mar 14, 2023
@amcasey
Copy link
Member

amcasey commented Mar 17, 2023

I might be missing something, but this looks very similar to dotnet/runtime#73688.

@amcasey
Copy link
Member

amcasey commented Mar 17, 2023

@rzikm @CarnaViire Am I mistaken in thinking this should have been fixed by dotnet/runtime#75179?

Edit: I see that we have a lot of our own locking code in this area, so there may be an additional race.

@amcasey
Copy link
Member

amcasey commented Mar 17, 2023

@mrgleba @ladeak One thing I notice looking through the frames in the call stack is that there's a whole bunch of logging around what was aborted and why. Are you seeing that in your logs? You might need LogLevel.Debug.

@wfurt
Copy link
Member

wfurt commented Mar 20, 2023

yes, this looks similar to dotnet/runtime#73688. We should probably collect System.Net.Quic logs and possibly rotaries from MsQuic as well https://github.com/microsoft/msquic/blob/main/docs/Diagnostics.md

Note that the fix was specifically around ODE. As far as I know MsQuic could return the QUIC_STATUS_INVALID_PARAMETER in other cases when it feels connection/stream is closed and done.
The logs may yield some useful information.

Or somebody should come up with simple repro we can investigate.

@ladeak
Copy link
Contributor

ladeak commented Mar 20, 2023

I have this test webapplication I employ in a repo: https://github.com/ladeak/Http3Repl/tree/master/tests/TestWebApplication

There is really nothing special about it with the 3 endpoints. Invoke any with H3 request, wait a minute or two, and the error will show. I typically encounter the issue simply just using the MapGet for "/".

@amcasey I will increase the log level to further investigate.

@wfurt
Copy link
Member

wfurt commented Mar 20, 2023

How do you drive it? From HttpClient e.g C# & MsQuic? I'm wondering if the peer aborts the stream and then it is not open at the time Kestrel tries to do abort....

@ladeak
Copy link
Contributor

ladeak commented Mar 20, 2023

Yes, I use HttpClient (which I understand uses MsQuic).

@amcasey
Copy link
Member

amcasey commented Mar 29, 2023

I wrote a trivial HttpClient driver and have yet to see a repro (though it seems like a race, so that's hardly definitive).

Did you find anything interesting in the logs @ladeak?

@ladeak
Copy link
Contributor

ladeak commented Mar 29, 2023

Nothing so far, but also something apparent on .net8 preview 2.

@mrgleba
Copy link
Author

mrgleba commented May 17, 2023

I've tried upgrading libmsquic to 2.2.1 but the problem persists

@ManickaP
Copy link
Member

ManickaP commented May 17, 2023

To investigate this, we need Private.InternalDiagnostics.System.Net.Quic internal logging as well as msquic logs: https://github.com/microsoft/msquic/blob/main/docs/Diagnostics.md. I tried to reproduce this in the past without any success, so getting the logs from you is probably the best bet now.

BTW, do you kill the client process immediately after sending the request? Do you dispose the client? If not, this could be related to dotnet/runtime#71927 (TL;DR: MsQuic is a user space library so when you close the client app and the process disappears from the memory, anything that was buffered but not yet send by MsQuic gets lost. In the linked issues, it was the last ACK that got lost and server waiting on it timed out).

To confirm if this is it, you should dispose the HttpClient after the testing request and keep the process alive for a bit after that. If it doesn't reproduce then, it probably is dotnet/runtime#71927.

@mrgleba
Copy link
Author

mrgleba commented May 17, 2023

Just to clarify. The problem occurs on a prodction server (the logs are from customers' connections).
The messages are quite abundant (a few per minute).

I'll see if I can set up the logs, but it won't be easy in production :(

@ManickaP
Copy link
Member

Private.InternalDiagnostics.System.Net.Quic logs are not suitable for production, they are very verbose and might negatively impact perf. I thought that you were able to reproduce it locally with the repro shared above. But that was probably just @ladeak, maybe they can collect and share the logs.

@ladeak
Copy link
Contributor

ladeak commented May 20, 2023

This is my client console app (I am using .NET 8 Preview 4 at this time):

        using var client = new HttpClient();
        var request = new HttpRequestMessage(HttpMethod.Get, "https://localhost:5001/");
        request.Version = HttpVersion.Version30;
        request.VersionPolicy = HttpVersionPolicy.RequestVersionExact;
        var response = await client.SendAsync(request);
        response.EnsureSuccessStatusCode();
        Console.WriteLine(await response.Content.ReadAsStringAsync());
        await Task.Delay(10000);

(Not clear if you prefer the timeout after - or before the dispose).

I see this in the server logs, I was able to reproduce, but the errors typically come ~1 min after the requests:

crit: Microsoft.AspNetCore.Server.Kestrel[0]
      Unexpected exception in HttpConnection.ProcessRequestsAsync.
      System.Net.Quic.QuicException: An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER
         at System.Net.Quic.ThrowHelper.ThrowIfMsQuicError(Int32 status, String message)
         at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicStreamContext.Abort(ConnectionAbortedException abortReason)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3FrameWriter.Abort(ConnectionAbortedException error)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3ControlStream.Abort(ConnectionAbortedException abortReason, Http3ErrorCode errorCode)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication`1 httpApplication)

The server:

var builder = WebApplication.CreateBuilder(args);
builder.WebHost.UseKestrel(kestrel =>
{
    kestrel.ListenAnyIP(5001, options =>
    {
        options.UseHttps();
        options.Protocols = HttpProtocols.Http1AndHttp2AndHttp3;
    });
});
var app = builder. Build();

app.MapGet("/", async context =>
{
    await context.Response.WriteAsync("Hello World " + context.Request.Protocol.ToString());
});

I have collected the logs as linked using the netsh command. The file is 6.5MB and seems binary, how can I share it @ManickaP ?

netsh.exe trace start overwrite=yes report=dis correlation=dis traceFile=quic.etl provider={ff15e657-4f26-570e-88ab-0796b258d11c} level=0x5 keywords=0xffffffff

@ManickaP
Copy link
Member

I don't know the limits but you can attach it here, or somewhere else and share a link. Or you can send it to my email mapichov at microsoft dot com.

The etl file can be opened with PerfView: https://github.com/microsoft/perfview

@ladeak
Copy link
Contributor

ladeak commented May 22, 2023

I have sent it in an email. If it does not work, I'll share it through OneDrive.

@ManickaP
Copy link
Member

So I found 2 occurrences of [ api] Exit 2147942487 (e.g. INVALID_PARAMETER) in the logs. Both were for an incoming unidirectional stream (control | QPack) and both were due to the layer above (Kestrel) calling Abort on an already shutdown stream. What happens is that connection idles out and part of that process is also shutdown of all active streams (with the same cause "Connection Idle"). As control streams are not meant to be closed ever, they are alive at this point and they receive this event. What I suspect causes the error to happen here is that processing of this stream event that eventually removes the stream from _streams collection:

_context.StreamLifetimeHandler.OnStreamCompleted(this);

races with connection closure and in some cases the connection is faster and calls abort:
// Abort active request streams.
lock (_streams)
{
foreach (var stream in _streams.Values)
{
stream.Abort(CreateConnectionAbortError(error, clientAbort), (Http3ErrorCode)_errorCodeFeature.Error);
}
}

@JamesNK Could you look into what I found out? If I'm right, this is not a problem of S.N.Quic.

Details from msquic logs:

Error case:
2023-05-20T06:41:35.813851,6155,5,Microsoft-Quic,[strm][0x17783af6e00] Indicating QUIC_STREAM_EVENT_SHUTDOWN_COMPLETE [ConnectionShutdown=1, ConnectionShutdownByApp=0, ConnectionClosedRemotely=0, Co
2023-05-20T06:41:35.815034,11,5,Microsoft-Quic,[ api] Enter STREAM_SHUTDOWN (0x17783af6e00).
2023-05-20T06:41:35.815036,13,5,Microsoft-Quic,[ api] Exit 2147942487

Correct case:
2023-05-20T06:41:49.119196,6155,5,Microsoft-Quic,[strm][0x17783b92870] Indicating QUIC_STREAM_EVENT_SHUTDOWN_COMPLETE [ConnectionShutdown=1, ConnectionShutdownByApp=0, ConnectionClosedRemotely=0, Co

@amcasey
Copy link
Member

amcasey commented May 25, 2023

@ManickaP Thanks for that analysis! I can take next steps (which will probably be following up with @JamesNK 😆).

@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
@mrgleba
Copy link
Author

mrgleba commented Nov 17, 2023

This still occurs with net8 & libmsquic.2.2.4 😢

@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@wtgodbe wtgodbe removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@wtgodbe wtgodbe removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 13, 2024
@dotnet dotnet deleted a comment from dotnet-policy-service bot Feb 13, 2024
@dotnet dotnet deleted a comment from dotnet-policy-service bot Feb 13, 2024
@amcasey amcasey added this to the .NET 9 Planning milestone Mar 5, 2024
@Beblia
Copy link

Beblia commented Sep 13, 2024

Hello everyone. Any updates on this issue? I have something similar as well.
100+ of these errors every single day. This happens on both .NET 8 and 9

2024-09-12 20:52:56.705 -07:00 [FTL] Unexpected exception in HttpConnection.ProcessRequestsAsync.
System.ArgumentOutOfRangeException: A value between 0x0 and 0x3fffffffffffffff is required. (Parameter 'errorCode')
Actual value was -1.
at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.QuicTransportOptions.ValidateErrorCode(Int64 errorCode)
at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.set_Error(Int64 value)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.Abort(ConnectionAbortedException ex, Http3ErrorCode errorCode, ConnectionEndReason reason)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication1 application) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication1 httpApplication)
2024-09-12 20:52:59.058 -07:00 [FTL] Unexpected exception in HttpConnection.ProcessRequestsAsync.
System.ArgumentOutOfRangeException: A value between 0x0 and 0x3fffffffffffffff is required. (Parameter 'errorCode')
Actual value was -1.
at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.QuicTransportOptions.ValidateErrorCode(Int64 errorCode)
at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.set_Error(Int64 value)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.Abort(ConnectionAbortedException ex, Http3ErrorCode errorCode, ConnectionEndReason reason)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication1 application) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication1 httpApplication)
2024-09-12 20:53:06.262 -07:00 [FTL] Unexpected exception in HttpConnection.ProcessRequestsAsync.
System.ArgumentOutOfRangeException: A value between 0x0 and 0x3fffffffffffffff is required. (Parameter 'errorCode')
Actual value was -1.
at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.QuicTransportOptions.ValidateErrorCode(Int64 errorCode)
at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.set_Error(Int64 value)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.Abort(ConnectionAbortedException ex, Http3ErrorCode errorCode, ConnectionEndReason reason)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication1 application) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication1 httpApplication)
2024-09-12 20:53:07.864 -07:00 [FTL] Unexpected exception in HttpConnection.ProcessRequestsAsync.
System.ArgumentOutOfRangeException: A value between 0x0 and 0x3fffffffffffffff is required. (Parameter 'errorCode')
Actual value was -1.
at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.QuicTransportOptions.ValidateErrorCode(Int64 errorCode)
at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.set_Error(Int64 value)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.Abort(ConnectionAbortedException ex, Http3ErrorCode errorCode, ConnectionEndReason reason)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication1 application) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication1 httpApplication)
2024-09-12 20:53:09.085 -07:00 [FTL] Unexpected exception in HttpConnection.ProcessRequestsAsync.
System.ArgumentOutOfRangeException: A value between 0x0 and 0x3fffffffffffffff is required. (Parameter 'errorCode')
Actual value was -1.
at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.QuicTransportOptions.ValidateErrorCode(Int64 errorCode)
at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.set_Error(Int64 value)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.Abort(ConnectionAbortedException ex, Http3ErrorCode errorCode, ConnectionEndReason reason)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication1 application) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication1 httpApplication)

@amcasey
Copy link
Member

amcasey commented Sep 17, 2024

Here's another place the stream could (race to?) be aborted

stream.Abort(new ConnectionAbortedException(CoreStrings.Http3ControlStreamHeaderTimeout), Http3ErrorCode.StreamCreationError);

It looks like it might have the wrong error though, so probably a red herring.

@amcasey
Copy link
Member

amcasey commented Sep 17, 2024

Toy repro: launch server, launch client, wait ~20 seconds

Server

using Microsoft.AspNetCore.Server.Kestrel.Core;

var builder = WebApplication.CreateBuilder(args);

builder.WebHost.ConfigureKestrel(options =>
{
    options.ListenLocalhost(5001, listenOptions =>
    {
        listenOptions.Protocols = HttpProtocols.Http3;
        listenOptions.UseHttps();
    });

    options.Limits.KeepAliveTimeout = TimeSpan.FromSeconds(5);
});

var app = builder.Build();

app.MapGet("/", () => "Hello, world!");

app.Run();

Client

using var client = new HttpClient()
{
    BaseAddress = new Uri("https://localhost:5001"),
    DefaultRequestVersion = new Version(3, 0),
    DefaultVersionPolicy = HttpVersionPolicy.RequestVersionExact,
};

try
{
    var response = await client.GetAsync("/");
    response.EnsureSuccessStatusCode();

    var data = await response.Content.ReadAsStringAsync();
    Console.WriteLine(data);
}
catch (HttpRequestException e)
{
    Console.WriteLine($"Request error: {e.Message}");
}

@amcasey
Copy link
Member

amcasey commented Sep 17, 2024

Oops, my repro is for a slightly different (but probably related?) error:

crit: Microsoft.AspNetCore.Server.Kestrel[0]
      Unexpected exception in HttpConnection.ProcessRequestsAsync.
      System.ArgumentOutOfRangeException: A value between 0x0 and 0x3fffffffffffffff is required. (Parameter 'errorCode')
      Actual value was -1.
         at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.QuicTransportOptions.ValidateErrorCode(Int64 errorCode)
         at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.set_Error(Int64 value)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.Abort(ConnectionAbortedException ex, Http3ErrorCode errorCode, ConnectionEndReason reason)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication`1 httpApplication)

This validation was added in #55282.

Edit: filed #57933 for the new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions HTTP3 investigate
Projects
None yet
Development

No branches or pull requests