Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUIC: QuicStream.WaitForWriteCompletionAsync sometimes doesn't complete #71927

Open
JamesNK opened this issue Jul 11, 2022 · 29 comments
Open

QUIC: QuicStream.WaitForWriteCompletionAsync sometimes doesn't complete #71927

JamesNK opened this issue Jul 11, 2022 · 29 comments
Labels
area-System.Net.Quic bug tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Milestone

Comments

@JamesNK
Copy link
Member

JamesNK commented Jul 11, 2022

Description

I've noticed Kestrel sometimes throws an error when handling a gRPC request. See dotnet/aspnetcore#42289

tldr; the client sent a gRPC request, the server processed it and sent a response, but aborted the connection because the request is hung on WaitForWriteCompletionAsync, and thought the request was still in-progress and exceeded the timeout to drain the request.

I investigated what is going on and it appears that most of the time in my scenario QuicStream.WaitForWriteCompletionAsync never returns. It appears to work reliably if there is a breakpoint before awaiting on a task, but not if there is a breakpoint after awaiting the task.

Reproduction Steps

  1. git clone https://github.com/jamesnk/grpc-dotnet
  2. git checkout jamesnk/net7
  3. dotnet run -c Release --project .\perf\benchmarkapps\GrpcAspNetCoreServer\ -- --LogLevel Trace --protocol h3
  4. dotnet run -c Release --project .\perf\benchmarkapps\GrpcClient\ -- -u https://localhost:5000 -c 1 --streams 1 -s unary -p h3 --grpcClientType grpcnetclient --callCount 1

Expected behavior

Client and server gracefully complete request.

Actual behavior

Half the time the QuicStream.WaitForWriteCompletionAsync task doesn't complete. The server aborts the request and the connection because they're unhealthy.

Logs of success and failure are below.

Success logs:

23.0973 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[2]
      Stream id "0HMJ31P1V8BHG:00000002" type Unidirectional accepted.
23.0974 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[46]
      Connection id "0HMJ31P1V8BHG" received SETTINGS frame for stream ID 2 with length 5.
23.1109 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#66993893,NativeCallback,[conn][0x29C45A0F6F0] Connection received event PEER_STREAM_STARTED ->
23.1109 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : SafeMsQuicStreamHandle#14836038,.ctor,MsQuicSafeHandle created ->
23.1109 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,.ctor,[strm][0x29C459C94D0] Inbound bidirectional stream created in connection [conn][0x29C45A0F6F0] with StreamId 0. ->
23.1109 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event RECEIVE ->
23.1109 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,HandleEventReceive,[strm][0x29C459C94D0] Stream received 256 bytes ->
23.1110 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,ReadAsync,[strm][0x29C459C94D0] Stream reading into Memory of '4096' bytes. ->
23.1110 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,ReadAsync,[strm][0x29C459C94D0] Stream reading into Memory of '4096' bytes. ->
23.1114 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[2]
      Stream id "0HMJ31P1V8BHG:00000000" type Bidirectional accepted.
23.1117 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[46]
      Connection id "0HMJ31P1V8BHG" received HEADERS frame for stream ID 0 with length 251.
23.1117 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event RECEIVE ->
23.1117 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,HandleEventReceive,[strm][0x29C459C94D0] Stream received 7 bytes ->
23.1119 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,ReadAsync,[strm][0x29C459C94D0] Stream reading into Memory of '4096' bytes. ->
23.1119 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[46]
      Connection id "0HMJ31P1V8BHG" received DATA frame for stream ID 0 with length 7.
23.1121 info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
      Request starting HTTP/3 POST https://localhost:5000/grpc.testing.BenchmarkService/UnaryCall application/grpc -
23.1121 trce: Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware[2]
      All hosts are allowed.
23.1121 dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001]
      1 candidate(s) found for the request path '/grpc.testing.BenchmarkService/UnaryCall'
23.1121 dbug: Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware[1]
      Request matched endpoint 'gRPC - /grpc.testing.BenchmarkService/UnaryCall'
23.1121 info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0]
      Executing endpoint 'gRPC - /grpc.testing.BenchmarkService/UnaryCall'
23.1122 dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
23.1122 dbug: Microsoft.AspNetCore.Server.Kestrel[25]
      Connection id "0HMJ31P1V8BHG", Request id "0HMJ31P1V8BHG:00000000": started reading request body.
23.1122 trce: Grpc.AspNetCore.Server.ServerCallHandler[12]
      Deserializing 2 byte message to 'Grpc.Testing.SimpleRequest'.
23.1122 trce: Grpc.AspNetCore.Server.ServerCallHandler[13]
      Received message.
23.1129 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event RECEIVE ->
23.1129 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,HandleEventReceive,[strm][0x29C459C94D0] Stream received 0 bytes with FIN flag ->
23.1129 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event PEER_SEND_SHUTDOWN ->
23.1129 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,HandleEventPeerSendShutdown,[strm][0x29C459C94D0] Stream completing resettable event source. ->
23.1129 dbug: Microsoft.AspNetCore.Server.Kestrel[26]
      Connection id "0HMJ31P1V8BHG", Request id "0HMJ31P1V8BHG:00000000": done reading request body.
23.1130 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ31P1V8BHG" sending HEADERS frame for stream ID 0 with length 76.
23.1130 dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
23.1130 trce: Grpc.AspNetCore.Server.ServerCallHandler[18]
      Serialized 'Grpc.Testing.SimpleResponse' to 2 byte message.
23.1130 trce: Grpc.AspNetCore.Server.ServerCallHandler[16]
      Message sent.
23.1130 info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1]
      Executed endpoint 'gRPC - /grpc.testing.BenchmarkService/UnaryCall'
23.1130 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ31P1V8BHG" sending DATA frame for stream ID 0 with length 7.
23.1131 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event SEND_COMPLETE ->
23.1132 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ31P1V8BHG" sending HEADERS frame for stream ID 0 with length 17.
23.1132 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event SEND_COMPLETE ->
23.1133 info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
      Request finished HTTP/3 POST https://localhost:5000/grpc.testing.BenchmarkService/UnaryCall application/grpc - - 200 - application/grpc 1.2488ms
23.1134 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[10]
      Stream id "0HMJ31P1V8BHG:00000000" shutting down writes because: "The QUIC transport's send loop completed gracefully.".
23.1401 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event SEND_SHUTDOWN_COMPLETE ->
23.1401 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event SHUTDOWN_COMPLETE ->
23.1401 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,HandleEventShutdownComplete,[strm][0x29C459C94D0] Stream completing resettable event source. ->
23.1402 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,Dispose,[strm][0x29C459C94D0] Stream disposing True ->
23.1403 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,Cleanup,[strm][0x29C459C94D0] releasing handles. ->
23.1403 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : SafeMsQuicStreamHandle#14836038,ReleaseHandle,MsQuicSafeHandle released ->
23.1403 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,Dispose,[strm][0x29C459C94D0] Stream disposed ->

Failure logs:

43.0608 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[2]
      Stream id "0HMJ31P1V8BHF:00000000" type Bidirectional accepted.
43.0610 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[46]
      Connection id "0HMJ31P1V8BHF" received HEADERS frame for stream ID 0 with length 251.
43.0611 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,NativeCallback,[strm][0x29C459C9A70] Stream received event RECEIVE ->
43.0612 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,HandleEventReceive,[strm][0x29C459C9A70] Stream received 7 bytes ->
43.0613 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,ReadAsync,[strm][0x29C459C9A70] Stream reading into Memory of '4096' bytes. ->
43.0613 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[46]
      Connection id "0HMJ31P1V8BHF" received DATA frame for stream ID 0 with length 7.
43.0614 info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
      Request starting HTTP/3 POST https://localhost:5000/grpc.testing.BenchmarkService/UnaryCall application/grpc -
43.0614 trce: Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware[2]
      All hosts are allowed.
43.0615 dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001]
      1 candidate(s) found for the request path '/grpc.testing.BenchmarkService/UnaryCall'
43.0615 dbug: Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware[1]
      Request matched endpoint 'gRPC - /grpc.testing.BenchmarkService/UnaryCall'
43.0615 info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0]
      Executing endpoint 'gRPC - /grpc.testing.BenchmarkService/UnaryCall'
43.0615 dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
43.0615 dbug: Microsoft.AspNetCore.Server.Kestrel[25]
      Connection id "0HMJ31P1V8BHF", Request id "0HMJ31P1V8BHF:00000000": started reading request body.
43.0615 trce: Grpc.AspNetCore.Server.ServerCallHandler[12]
      Deserializing 2 byte message to 'Grpc.Testing.SimpleRequest'.
43.0615 trce: Grpc.AspNetCore.Server.ServerCallHandler[13]
      Received message.
43.0624 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,NativeCallback,[strm][0x29C459C9A70] Stream received event RECEIVE ->
43.0624 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,HandleEventReceive,[strm][0x29C459C9A70] Stream received 0 bytes with FIN flag ->
43.0624 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,NativeCallback,[strm][0x29C459C9A70] Stream received event PEER_SEND_SHUTDOWN ->
43.0624 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,HandleEventPeerSendShutdown,[strm][0x29C459C9A70] Stream completing resettable event source. ->
43.0625 dbug: Microsoft.AspNetCore.Server.Kestrel[26]
      Connection id "0HMJ31P1V8BHF", Request id "0HMJ31P1V8BHF:00000000": done reading request body.
43.0625 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ31P1V8BHF" sending HEADERS frame for stream ID 0 with length 76.
43.0625 dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
43.0626 trce: Grpc.AspNetCore.Server.ServerCallHandler[18]
      Serialized 'Grpc.Testing.SimpleResponse' to 2 byte message.
43.0626 trce: Grpc.AspNetCore.Server.ServerCallHandler[16]
      Message sent.
43.0626 info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1]
      Executed endpoint 'gRPC - /grpc.testing.BenchmarkService/UnaryCall'
43.0626 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ31P1V8BHF" sending DATA frame for stream ID 0 with length 7.
43.0627 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,NativeCallback,[strm][0x29C459C9A70] Stream received event SEND_COMPLETE ->
43.0627 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ31P1V8BHF" sending HEADERS frame for stream ID 0 with length 17.
43.0628 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,NativeCallback,[strm][0x29C459C9A70] Stream received event SEND_COMPLETE ->
43.0628 info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
      Request finished HTTP/3 POST https://localhost:5000/grpc.testing.BenchmarkService/UnaryCall application/grpc - - 200 - application/grpc 1.3970ms
43.0629 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[10]
      Stream id "0HMJ31P1V8BHF:00000000" shutting down writes because: "The QUIC transport's send loop completed gracefully.".

The request then hangs, waiting on the task that doesn't complete. After 5 seconds Kestrel aborts the connection which calls QuicConnection.CloseAsync which unsticks WaitForWriteCompletionAsync and the task completes.

Logs after abort:

02.0392 dbug: Microsoft.AspNetCore.Server.Kestrel.BadRequests[28]
      Connection id "0HMJ330OH884L", Request id "0HMJ330OH884L:00000000": the connection was closed because the response was not read by the client at the specified minimum data rate.
06.7638 dbug: Microsoft.AspNetCore.Server.Kestrel.Http3[53]
      Connection id "0HMJ330OH884L": GOAWAY stream ID 4.
06.7638 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ330OH884L" sending GOAWAY frame for stream ID 3 with length 1.
06.7639 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#16104161,NativeCallback,[strm][0x294D85D0760] Stream received event SEND_COMPLETE ->
06.7639 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[6]
      Connection id "0HMJ330OH884L" aborted by application with error code 258 because: "The connection was timed out by the server because the response was not read by the client at the specified minimum data rate.".
06.7640 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#33397131,NativeCallback,[strm][0x294D85D1840] Stream received event SEND_SHUTDOWN_COMPLETE ->
06.7640 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#33397131,NativeCallback,[strm][0x294D85D1840] Stream received event SHUTDOWN_COMPLETE ->
06.7640 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#33397131,HandleEventConnectionClose,[strm][0x294D85D1840] Stream handling connection [conn][0x294D8698DD0] close ->
06.7647 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#31139921,NativeCallback,[strm][0x294D85D0A30] Stream received event SHUTDOWN_COMPLETE ->
06.7647 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#31139921,HandleEventConnectionClose,[strm][0x294D85D0A30] Stream handling connection [conn][0x294D8698DD0] close ->
06.7657 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#16104161,NativeCallback,[strm][0x294D85D0760] Stream received event SEND_SHUTDOWN_COMPLETE ->
06.7657 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#16104161,NativeCallback,[strm][0x294D85D0760] Stream received event SHUTDOWN_COMPLETE ->
06.7657 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#16104161,HandleEventConnectionClose,[strm][0x294D85D0760] Stream handling connection [conn][0x294D8698DD0] close ->
06.7822 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#33397131,Dispose,[strm][0x294D85D1840] Stream disposing True ->
06.7822 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#33397131,Cleanup,[strm][0x294D85D1840] releasing handles. ->
06.7822 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : SafeMsQuicStreamHandle#6849668,ReleaseHandle,MsQuicSafeHandle released ->
06.7822 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#33397131,Dispose,[strm][0x294D85D1840] Stream disposed ->
06.7995 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[10]
      Stream id "0HMJ330OH884L:00000003" shutting down writes because: "Operation aborted.".
07.0183 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#45133484,NativeCallback,[conn][0x294D8698DD0] Connection received event SHUTDOWN_COMPLETE ->
07.0741 dbug: Microsoft.AspNetCore.Server.Kestrel.BadRequests[20]
      Connection id "0HMJ330OH884L" request processing ended abnormally.
      Microsoft.AspNetCore.Connections.ConnectionAbortedException: The connection was timed out by the server because the response was not read by the client at the specified minimum data rate.
         at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.AcceptAsync(CancellationToken cancellationToken)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
07.0743 dbug: Microsoft.AspNetCore.Server.Kestrel.Http3[44]
      Connection id "0HMJ330OH884L" is closed. The last processed stream ID was 0.
07.0744 dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[2]
      Connection id "0HMJ330OH884L" stopped.
07.0744 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#45133484,Dispose,[conn][0x294D8698DD0] Connection disposing True ->

Regression?

Unknown

Known Workarounds

No response

Configuration

Latest .NET 7 daily build

Other information

No response

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jul 11, 2022
@ghost
Copy link

ghost commented Jul 11, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I've noticed Kestrel sometimes throws an error when handling a gRPC request. See dotnet/aspnetcore#42289

tldr; the client sent a gRPC request, the server processed it and sent a response, but aborted the connection because the request is hung on WaitForWriteCompletionAsync, and thought the request was still in-progress and exceeded the timeout to drain the request.

I investigated what is going on and it appears that most of the time in my scenario QuicStream.WaitForWriteCompletionAsync never returns. It appears to work reliably if there is a breakpoint before awaiting on a task, but not if there is a breakpoint after awaiting the task.

Reproduction Steps

  1. git clone https://github.com/jamesnk/grpc-dotnet
  2. git checkout jamesnk/net7
  3. dotnet run -c Release --project .\perf\benchmarkapps\GrpcClient\ -- -u https://localhost:5000 -c 1 --streams 1 -s unary -p h3 --grpcClientType grpcnetclient --callCount 1
  4. dotnet run -c Release --project .\perf\benchmarkapps\GrpcAspNetCoreServer\ -- --LogLevel Trace --protocol h3

Expected behavior

Client and server gracefully complete request.

Actual behavior

Half the time the QuicStream.WaitForWriteCompletionAsync task doesn't complete. The server aborts the request and the connection because they're unhealthy.

Logs of success and failure are below.

Success logs:

23.0973 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[2]
      Stream id "0HMJ31P1V8BHG:00000002" type Unidirectional accepted.
23.0974 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[46]
      Connection id "0HMJ31P1V8BHG" received SETTINGS frame for stream ID 2 with length 5.
23.1109 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#66993893,NativeCallback,[conn][0x29C45A0F6F0] Connection received event PEER_STREAM_STARTED ->
23.1109 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : SafeMsQuicStreamHandle#14836038,.ctor,MsQuicSafeHandle created ->
23.1109 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,.ctor,[strm][0x29C459C94D0] Inbound bidirectional stream created in connection [conn][0x29C45A0F6F0] with StreamId 0. ->
23.1109 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event RECEIVE ->
23.1109 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,HandleEventReceive,[strm][0x29C459C94D0] Stream received 256 bytes ->
23.1110 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,ReadAsync,[strm][0x29C459C94D0] Stream reading into Memory of '4096' bytes. ->
23.1110 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,ReadAsync,[strm][0x29C459C94D0] Stream reading into Memory of '4096' bytes. ->
23.1114 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[2]
      Stream id "0HMJ31P1V8BHG:00000000" type Bidirectional accepted.
23.1117 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[46]
      Connection id "0HMJ31P1V8BHG" received HEADERS frame for stream ID 0 with length 251.
23.1117 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event RECEIVE ->
23.1117 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,HandleEventReceive,[strm][0x29C459C94D0] Stream received 7 bytes ->
23.1119 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,ReadAsync,[strm][0x29C459C94D0] Stream reading into Memory of '4096' bytes. ->
23.1119 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[46]
      Connection id "0HMJ31P1V8BHG" received DATA frame for stream ID 0 with length 7.
23.1121 info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
      Request starting HTTP/3 POST https://localhost:5000/grpc.testing.BenchmarkService/UnaryCall application/grpc -
23.1121 trce: Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware[2]
      All hosts are allowed.
23.1121 dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001]
      1 candidate(s) found for the request path '/grpc.testing.BenchmarkService/UnaryCall'
23.1121 dbug: Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware[1]
      Request matched endpoint 'gRPC - /grpc.testing.BenchmarkService/UnaryCall'
23.1121 info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0]
      Executing endpoint 'gRPC - /grpc.testing.BenchmarkService/UnaryCall'
23.1122 dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
23.1122 dbug: Microsoft.AspNetCore.Server.Kestrel[25]
      Connection id "0HMJ31P1V8BHG", Request id "0HMJ31P1V8BHG:00000000": started reading request body.
23.1122 trce: Grpc.AspNetCore.Server.ServerCallHandler[12]
      Deserializing 2 byte message to 'Grpc.Testing.SimpleRequest'.
23.1122 trce: Grpc.AspNetCore.Server.ServerCallHandler[13]
      Received message.
23.1129 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event RECEIVE ->
23.1129 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,HandleEventReceive,[strm][0x29C459C94D0] Stream received 0 bytes with FIN flag ->
23.1129 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event PEER_SEND_SHUTDOWN ->
23.1129 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,HandleEventPeerSendShutdown,[strm][0x29C459C94D0] Stream completing resettable event source. ->
23.1129 dbug: Microsoft.AspNetCore.Server.Kestrel[26]
      Connection id "0HMJ31P1V8BHG", Request id "0HMJ31P1V8BHG:00000000": done reading request body.
23.1130 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ31P1V8BHG" sending HEADERS frame for stream ID 0 with length 76.
23.1130 dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
23.1130 trce: Grpc.AspNetCore.Server.ServerCallHandler[18]
      Serialized 'Grpc.Testing.SimpleResponse' to 2 byte message.
23.1130 trce: Grpc.AspNetCore.Server.ServerCallHandler[16]
      Message sent.
23.1130 info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1]
      Executed endpoint 'gRPC - /grpc.testing.BenchmarkService/UnaryCall'
23.1130 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ31P1V8BHG" sending DATA frame for stream ID 0 with length 7.
23.1131 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event SEND_COMPLETE ->
23.1132 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ31P1V8BHG" sending HEADERS frame for stream ID 0 with length 17.
23.1132 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event SEND_COMPLETE ->
23.1133 info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
      Request finished HTTP/3 POST https://localhost:5000/grpc.testing.BenchmarkService/UnaryCall application/grpc - - 200 - application/grpc 1.2488ms
23.1134 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[10]
      Stream id "0HMJ31P1V8BHG:00000000" shutting down writes because: "The QUIC transport's send loop completed gracefully.".
23.1401 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event SEND_SHUTDOWN_COMPLETE ->
23.1401 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,NativeCallback,[strm][0x29C459C94D0] Stream received event SHUTDOWN_COMPLETE ->
23.1401 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,HandleEventShutdownComplete,[strm][0x29C459C94D0] Stream completing resettable event source. ->
23.1402 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,Dispose,[strm][0x29C459C94D0] Stream disposing True ->
23.1403 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,Cleanup,[strm][0x29C459C94D0] releasing handles. ->
23.1403 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : SafeMsQuicStreamHandle#14836038,ReleaseHandle,MsQuicSafeHandle released ->
23.1403 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#29809623,Dispose,[strm][0x29C459C94D0] Stream disposed ->

Failure logs:

43.0608 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[2]
      Stream id "0HMJ31P1V8BHF:00000000" type Bidirectional accepted.
43.0610 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[46]
      Connection id "0HMJ31P1V8BHF" received HEADERS frame for stream ID 0 with length 251.
43.0611 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,NativeCallback,[strm][0x29C459C9A70] Stream received event RECEIVE ->
43.0612 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,HandleEventReceive,[strm][0x29C459C9A70] Stream received 7 bytes ->
43.0613 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,ReadAsync,[strm][0x29C459C9A70] Stream reading into Memory of '4096' bytes. ->
43.0613 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[46]
      Connection id "0HMJ31P1V8BHF" received DATA frame for stream ID 0 with length 7.
43.0614 info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
      Request starting HTTP/3 POST https://localhost:5000/grpc.testing.BenchmarkService/UnaryCall application/grpc -
43.0614 trce: Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware[2]
      All hosts are allowed.
43.0615 dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001]
      1 candidate(s) found for the request path '/grpc.testing.BenchmarkService/UnaryCall'
43.0615 dbug: Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware[1]
      Request matched endpoint 'gRPC - /grpc.testing.BenchmarkService/UnaryCall'
43.0615 info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0]
      Executing endpoint 'gRPC - /grpc.testing.BenchmarkService/UnaryCall'
43.0615 dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
43.0615 dbug: Microsoft.AspNetCore.Server.Kestrel[25]
      Connection id "0HMJ31P1V8BHF", Request id "0HMJ31P1V8BHF:00000000": started reading request body.
43.0615 trce: Grpc.AspNetCore.Server.ServerCallHandler[12]
      Deserializing 2 byte message to 'Grpc.Testing.SimpleRequest'.
43.0615 trce: Grpc.AspNetCore.Server.ServerCallHandler[13]
      Received message.
43.0624 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,NativeCallback,[strm][0x29C459C9A70] Stream received event RECEIVE ->
43.0624 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,HandleEventReceive,[strm][0x29C459C9A70] Stream received 0 bytes with FIN flag ->
43.0624 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,NativeCallback,[strm][0x29C459C9A70] Stream received event PEER_SEND_SHUTDOWN ->
43.0624 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,HandleEventPeerSendShutdown,[strm][0x29C459C9A70] Stream completing resettable event source. ->
43.0625 dbug: Microsoft.AspNetCore.Server.Kestrel[26]
      Connection id "0HMJ31P1V8BHF", Request id "0HMJ31P1V8BHF:00000000": done reading request body.
43.0625 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ31P1V8BHF" sending HEADERS frame for stream ID 0 with length 76.
43.0625 dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
43.0626 trce: Grpc.AspNetCore.Server.ServerCallHandler[18]
      Serialized 'Grpc.Testing.SimpleResponse' to 2 byte message.
43.0626 trce: Grpc.AspNetCore.Server.ServerCallHandler[16]
      Message sent.
43.0626 info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1]
      Executed endpoint 'gRPC - /grpc.testing.BenchmarkService/UnaryCall'
43.0626 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ31P1V8BHF" sending DATA frame for stream ID 0 with length 7.
43.0627 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,NativeCallback,[strm][0x29C459C9A70] Stream received event SEND_COMPLETE ->
43.0627 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ31P1V8BHF" sending HEADERS frame for stream ID 0 with length 17.
43.0628 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#3891103,NativeCallback,[strm][0x29C459C9A70] Stream received event SEND_COMPLETE ->
43.0628 info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
      Request finished HTTP/3 POST https://localhost:5000/grpc.testing.BenchmarkService/UnaryCall application/grpc - - 200 - application/grpc 1.3970ms
43.0629 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[10]
      Stream id "0HMJ31P1V8BHF:00000000" shutting down writes because: "The QUIC transport's send loop completed gracefully.".

The request then hangs, waiting on the task that doesn't complete. After 5 seconds Kestrel aborts the connection which calls QuicConnection.CloseAsync which unsticks WaitForWriteCompletionAsync and the task completes.

Logs after abort:

02.0392 dbug: Microsoft.AspNetCore.Server.Kestrel.BadRequests[28]
      Connection id "0HMJ330OH884L", Request id "0HMJ330OH884L:00000000": the connection was closed because the response was not read by the client at the specified minimum data rate.
06.7638 dbug: Microsoft.AspNetCore.Server.Kestrel.Http3[53]
      Connection id "0HMJ330OH884L": GOAWAY stream ID 4.
06.7638 trce: Microsoft.AspNetCore.Server.Kestrel.Http3[47]
      Connection id "0HMJ330OH884L" sending GOAWAY frame for stream ID 3 with length 1.
06.7639 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#16104161,NativeCallback,[strm][0x294D85D0760] Stream received event SEND_COMPLETE ->
06.7639 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[6]
      Connection id "0HMJ330OH884L" aborted by application with error code 258 because: "The connection was timed out by the server because the response was not read by the client at the specified minimum data rate.".
06.7640 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#33397131,NativeCallback,[strm][0x294D85D1840] Stream received event SEND_SHUTDOWN_COMPLETE ->
06.7640 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#33397131,NativeCallback,[strm][0x294D85D1840] Stream received event SHUTDOWN_COMPLETE ->
06.7640 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#33397131,HandleEventConnectionClose,[strm][0x294D85D1840] Stream handling connection [conn][0x294D8698DD0] close ->
06.7647 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#31139921,NativeCallback,[strm][0x294D85D0A30] Stream received event SHUTDOWN_COMPLETE ->
06.7647 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#31139921,HandleEventConnectionClose,[strm][0x294D85D0A30] Stream handling connection [conn][0x294D8698DD0] close ->
06.7657 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#16104161,NativeCallback,[strm][0x294D85D0760] Stream received event SEND_SHUTDOWN_COMPLETE ->
06.7657 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#16104161,NativeCallback,[strm][0x294D85D0760] Stream received event SHUTDOWN_COMPLETE ->
06.7657 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#16104161,HandleEventConnectionClose,[strm][0x294D85D0760] Stream handling connection [conn][0x294D8698DD0] close ->
06.7822 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#33397131,Dispose,[strm][0x294D85D1840] Stream disposing True ->
06.7822 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#33397131,Cleanup,[strm][0x294D85D1840] releasing handles. ->
06.7822 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : SafeMsQuicStreamHandle#6849668,ReleaseHandle,MsQuicSafeHandle released ->
06.7822 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#33397131,Dispose,[strm][0x294D85D1840] Stream disposed ->
06.7995 dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Quic[10]
      Stream id "0HMJ330OH884L:00000003" shutting down writes because: "Operation aborted.".
07.0183 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#45133484,NativeCallback,[conn][0x294D8698DD0] Connection received event SHUTDOWN_COMPLETE ->
07.0741 dbug: Microsoft.AspNetCore.Server.Kestrel.BadRequests[20]
      Connection id "0HMJ330OH884L" request processing ended abnormally.
      Microsoft.AspNetCore.Connections.ConnectionAbortedException: The connection was timed out by the server because the response was not read by the client at the specified minimum data rate.
         at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.AcceptAsync(CancellationToken cancellationToken)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
07.0743 dbug: Microsoft.AspNetCore.Server.Kestrel.Http3[44]
      Connection id "0HMJ330OH884L" is closed. The last processed stream ID was 0.
07.0744 dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[2]
      Connection id "0HMJ330OH884L" stopped.
07.0744 dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#45133484,Dispose,[conn][0x294D8698DD0] Connection disposing True ->

Regression?

Unknown

Known Workarounds

No response

Configuration

Latest .NET 7 daily build

Other information

No response

Author: JamesNK
Assignees: -
Labels:

area-System.Net.Quic

Milestone: -

@JamesNK
Copy link
Member Author

JamesNK commented Jul 11, 2022

BTW fixing this is a high priority. The bug doesn't impact the HTTP/3 request because it has told the client it is complete, but 5 seconds later the HTTP/3 connection is aborted.

@ManickaP
Copy link
Member

Could you point me to the test code for this? Looks like SEND_SHUTDOWN_COMPLETE hasn't arrived in your failed case.

@ManickaP
Copy link
Member

Is this a recent regression or has it been there for a while?

@JamesNK
Copy link
Member Author

JamesNK commented Jul 11, 2022

I first noticed it 23 days ago. I don't know if it happened before then.

@ManickaP ManickaP removed the untriaged New issue has not been triaged by the area owner label Jul 14, 2022
@ManickaP ManickaP added this to the 7.0.0 milestone Jul 14, 2022
@ManickaP
Copy link
Member

Triage: looks like reliability issue, let's investigate in 7.0. It should be fairly simple to reproduce locally.

@CarnaViire CarnaViire self-assigned this Jul 14, 2022
@CarnaViire
Copy link
Member

CarnaViire commented Jul 18, 2022

I can reproduce that, and indeed this is because SEND_SHUTDOWN_COMPLETE sometimes does not arrive to the server's stream.

While digging through MsQuic logs, I noticed that client side didn't send ACK for server's FIN flag before destroying the stream:

SERVER: sends FIN flag

2387 [4]3A44.5574::2022/07/18-14:38:26.406710900 [Microsoft-Quic][strm][0x240A42A4BF0] Built stream frame, offset=107 len=0 fin=1

2389 [4]3A44.5574::2022/07/18-14:38:26.406715700 [Microsoft-Quic][S][TX][16] SH DestCid: KP:0 SB:1 (Payload 15 bytes)
2390 [4]3A44.5574::2022/07/18-14:38:26.406717600 [Microsoft-Quic][S][TX][16]   ACK Largest:14 Delay:0
2391 [4]3A44.5574::2022/07/18-14:38:26.406719000 [Microsoft-Quic][S][TX][16]     14
2392 [4]3A44.5574::2022/07/18-14:38:26.406721200 [Microsoft-Quic][S][TX][16]   STREAM ID:0 Offset:107 Len:0 Fin

2401 [4]3A44.5574::2022/07/18-14:38:26.406733200 [Microsoft-Quic][conn][0x240A4218640] Sending batch. 1 datagrams

CLIENT: receives the FIN flag datagram

2422 [8]2AC4.6420::2022/07/18-14:38:26.406881200 [Microsoft-Quic][conn][0x21E299B6F00] Batch Recv 1 UDP datagrams

2424 [8]2AC4.6420::2022/07/18-14:38:26.406886200 [Microsoft-Quic][C][RX][16] SH DestCid: KP:0 SB:1 (Payload 15 bytes)
2425 [8]2AC4.6420::2022/07/18-14:38:26.406888200 [Microsoft-Quic][C][RX][16]   ACK Largest:14 Delay:0
2426 [8]2AC4.6420::2022/07/18-14:38:26.406889600 [Microsoft-Quic][C][RX][16]     14
2427 [8]2AC4.6420::2022/07/18-14:38:26.406891600 [Microsoft-Quic][C][RX][16]   STREAM ID:0 Offset:107 Len:0 Fin

2428 [8]2AC4.6420::2022/07/18-14:38:26.406892500 [Microsoft-Quic][conn][0x21E299B6F00][RX][16] 5(ONE_RTT) (16 bytes)

2434 [8]2AC4.6420::2022/07/18-14:38:26.406901600 [Microsoft-Quic][strm][0x21E29D79700] Received 0 bytes, offset=107 Ready=0

...marks for ACK but decides not to do that immediately

2436 [8]2AC4.6420::2022/07/18-14:38:26.406904000 [Microsoft-Quic][C][RX][16] Marked for ACK (ECN=0)
2437 [8]2AC4.6420::2022/07/18-14:38:26.406905500 [Microsoft-Quic][conn][0x21E299B6F00] Starting ACK_DELAY timer for 25 ms

...as now both of stream's directions are shut down, triggers last events

2468 [8]2AC4.6420::2022/07/18-14:38:26.424288300 [Microsoft-Quic][strm][0x21E29D79700] Indicating QUIC_STREAM_EVENT_PEER_SEND_SHUTDOWN
2469 [8]2AC4.6420::2022/07/18-14:38:26.424304800 [Microsoft-Quic][strm][0x21E29D79700] Indicating QUIC_STREAM_EVENT_SHUTDOWN_COMPLETE [ConnectionShutdown=0, ConnectionShutdownByPeer=0, ConnectionErrorCode=0x0]

2473 [0]2AC4.6260::2022/07/18-14:38:26.426986800 [Microsoft-Quic][ api] Enter 21(STREAM_SHUTDOWN) (0x21E29D79700).
2476 [0]2AC4.6260::2022/07/18-14:38:26.427000700 [Microsoft-Quic][ api] Enter 19(STREAM_CLOSE) (0x21E29D79700).

2485 [8]2AC4.6420::2022/07/18-14:38:26.427042200 [Microsoft-Quic][strm][0x21E29D79700] Destroyed

...after stream being destroyed, it seems that it forgot to send that delayed ACK to the peer.... it was scheduled for ~14:38:26.431 but stream was destroyed at 14:38:26.427

SERVER: did not hear from client, tries to retransmit FIN..... no response, it isn't even shown on client connection's logs (I guess it disregards datagrams addressed to already closed streams)

Full MsQuic logs

https://microsofteur-my.sharepoint.com/:f:/g/personal/knatalia_microsoft_com/EsnkYxGB26BKmRrIOh7va-4B8Ezq6elWQTPy-WAuV5-ZBw?e=Qu3hKO

@nibanks does that look like MsQuic bug?

@nibanks
Copy link

nibanks commented Jul 18, 2022

@CarnaViire was the stream aborted? I see that Ready=0 when the client receives the FIN, indicating you have a pending receive or are missing some payload and can't deliver yet. But regardless, I believe this means we're blocked from informing the app that we have the FIN. If the client aborts its receive first, then you won't get the FIN.

@nibanks
Copy link

nibanks commented Jul 18, 2022

Actually, after looking at the logs, it seems the client just stopped all together. It's not receiving any other packets. This is with a breakpoint attached to it?

@CarnaViire
Copy link
Member

CarnaViire commented Jul 18, 2022

The client's stream is not aborted, at least from what I see/understand. This is without any breakpoints, just one run of a client application, it executes one request, receives and handles the response and then finishes.

The following are .NET and gRPC level client logs from another run when the same problem occurred. Everything looks like receiving all the data with the following graceful shutdown of the stream. After stream is done, though, the connection stayed alive inside HttpClient and, I guess, just got killed after app finished. I wonder if that adds any problems? But that would not be the reason we got Ready=0 (whatever that means)

...
...
dbug: Grpc.Net.Client.Internal.GrpcCall[13]
      Reading message.
dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Http - HandlerMessage : 43205102,28068188,0,ReadFrameEnvelopeAsync,Received frame 0 of length 7. ->
trce: Grpc.Net.Client.Internal.GrpcCall[15]
      Deserializing 2 byte message to 'Grpc.Testing.SimpleResponse'.
dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#12053474,ReadAsync,[strm][0x200D5717C70] Stream reading into Memory of '64' bytes. ->
dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Http - HandlerMessage : 43205102,28068188,0,ReadFrameEnvelopeAsync,Received frame 1 of length 17. ->
dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#12053474,NativeCallback,[strm][0x200D5717C70] Stream received event PEER_SEND_SHUTDOWN ->
dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#12053474,HandleEventPeerSendShutdown,[strm][0x200D5717C70] Stream completing resettable event source. ->
dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#12053474,NativeCallback,[strm][0x200D5717C70] Stream received event SHUTDOWN_COMPLETE ->
dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#12053474,HandleEventShutdownComplete,[strm][0x200D5717C70] Stream completing resettable event source. ->
trce: Grpc.Net.Client.Internal.GrpcCall[16]
      Received message.
dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#12053474,Dispose,[strm][0x200D5717C70] Stream disposing True ->
dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#12053474,Cleanup,[strm][0x200D5717C70] releasing handles. ->
dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : SafeMsQuicStreamHandle#51484875,ReleaseHandle,MsQuicSafeHandle released ->
dbug: HttpEventSourceListener[0]
      <- Event Private.InternalDiagnostics.System.Net.Quic - Info : State#12053474,Dispose,[strm][0x200D5717C70] Stream disposed ->
dbug: Grpc.Net.Client.Internal.GrpcCall[4]
      Finished gRPC call.
[07:26:06.803] Reached 1
[07:26:06.803] 0 0: Finished unary
[07:26:06.804] Stopping client.
[07:26:06.804] Least Requests per Connection: 1
[07:26:06.804] Most Requests per Connection: 1
[07:26:06.808] RPS: ∞
[07:26:06.808] Total requests: 1
[07:26:06.808] Total errors: 0
[07:26:06.810] Mean latency: 365.535ms
[07:26:06.811] Max latency: 365.535ms
dbug: HttpEventSourceListener[0]
      Stopping HttpEventSourceListener.

@ManickaP
Copy link
Member

ManickaP commented Jul 18, 2022

After all, this might be the same issue as #72196.
What I've observed is that we receive data and partially consume them. But before the return value propagates to MsQuic, we read the data on a different thread and call StreamReceiveSetEnable, then MsQuic processes the return value and pauses receiving. But we think we've re-enabled them. And now we have a sleeping barber problem...

EDIT: Or is that not possible to happen due to MsQuic thread model?

@CarnaViire
Copy link
Member

we receive data and partially consume them.

I don't think it's the same here. Here we consume all the data, because we receive PEER_SEND_SHUTDOWN event, which would not arrive before the app level consumes everything the transport level received. At least that was my understanding.

@nibanks
Copy link

nibanks commented Jul 19, 2022

From what I can tell, the server was waiting on an ACK, but the client process was completely paused or killed. No logs were being generated from it any more. So it never could send the ACK. Notice we just stop see logs from process 2AC4.

image

@CarnaViire
Copy link
Member

We receive all the stream events including stream shutdown, after which we safely close the stream and believe everything was done with it. Is there a way for the application to know that there's still unfinished work on the transport level related to the stream? Or maybe is it somehow possible to flush the fin related to the stream on stream's closing? @nibanks

I wonder, does gRPC dispose HttpClient in the end, or does it stay there until it's destroyed on app exit? @JamesNK

@JamesNK
Copy link
Member Author

JamesNK commented Jul 19, 2022

It's up to the app to dispose of the channel/HttpClient.

In this case, the gRPC benchmark app isn't disposing HttpClient before the process exits.

@nibanks
Copy link

nibanks commented Jul 19, 2022

Is there a way for the application to know that there's still unfinished work on the transport level related to the stream?

If you immediately terminate things in response to receiving a stream event, you're always going to risk not acknowledging your peer, resulting in a behavior like this. You can try to do a graceful connection shutdown, and if you use MsQuic we make sure to always ACK everything when we close, but that's not a required behavior (so other implementations won't do it).

But, this is exactly why HTTP generally only closes on an idle timeout, to make sure everything was exchanged.

@CarnaViire
Copy link
Member

You can try to do a graceful connection shutdown

We can't do that after a request, because we don't know whether the user is done or not. We can only do so when the user signaled us that they're done - meaning, on dispose - but as James mentions above, the app is free to decide not to do that and just exit, as soon as it gets the full response from the server ☹️

HTTP generally only closes on an idle timeout

I think that's true for the server, but not for the client?

@nibanks
Copy link

nibanks commented Jul 19, 2022

I think that's true for the server, but not for the client?

HTTP is spec'ed to close on both sides based on the idle timeout.

app is free to decide not to do that and just exit, as soon as it gets the full response from the server

The app is free to do anything, but that doesn't mean it's correct. This is not a bug in MsQuic. If you terminate the QUIC connection or process before we have a chance to ACK, then that is a bug on the upper layer.

@JamesNK
Copy link
Member Author

JamesNK commented Jul 19, 2022

In the gRPC benchmark app the HTTP/3 request is completed gracefully. The client writes the message to the request body, and the server writes the response message including a complete notification. At that point shouldn't the client send a notification to the server that the write side of the stream is finished?

Or is the problem that there is a race between the process terminating and sending the notification to the server?

@nibanks
Copy link

nibanks commented Jul 19, 2022

AFAICT, once the client gets indication of the stream/request being complete on its side, the client is tearing everything down immediately (and never sending anything to the server again). This results in the ACK not getting sent to the server and the server never learns the stream was completed.

@CarnaViire
Copy link
Member

We've discussed offline that while there's no ideal solution without ruining the perf, flushing acks on stream closing (connection-wide) should greatly reduce the race condition window.
The PR on MsQuic side to test it is microsoft/msquic#2904
I will try out MsQuic privates to see whether the issue reproduces after the change and what's the perf impact.

@CarnaViire
Copy link
Member

Change as was is in microsoft/msquic#2904 (flush on stream's Shutdown) didn't fully help, because it might happen that the ACK isn't even scheduled yet by the time Shutdown executes. However, moving flush to stream's Close fixed the problem. I don't see the hang on server anymore (if i dispose the stream, but don't dispose the connection) -- neither in gRPC nor in my small Quic-only repro.

@nibanks mentioned that there still might be a problem if Close is called from within Shutdown callback (we don't do that in S.N.Quic, we call Close from managed thread after Shutdown callback unblocks TCS)

Next steps: check perf impact (unless @nibanks believes there has to be a more complicated change; if so, I will hold back until it's available)

@karelz karelz modified the milestones: 7.0.0, 8.0.0 Aug 2, 2022
@karelz karelz added bug tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly labels Aug 2, 2022
@karelz
Copy link
Member

karelz commented Aug 2, 2022

Triage: We had a meeting where we decided the scenario is fairly corner-case (Kestrel users won't see anything, unless they enable Diagnostics).
It would still be nice to have it implemented in msquic, but it is nice to have. Playing nicely with the server is desirable.

Moving to 8.0. If we have customer complains from the wild, we can always service .NET.

@CarnaViire
Copy link
Member

Triage: we need to create an issue in MsQuic repo and link the PR microsoft/msquic#2904 there + mention that in order for it to work, the code should be in Close, not Shutdown (see comment above).

@ManickaP
Copy link
Member

Triage: the last comment stands.

@wfurt
Copy link
Member

wfurt commented May 1, 2023

Did we created new issue @CarnaViire or is this something that was fixed in 2.2? I cannot quite tell and I'm not sure what is next step here.

@ValeryYegorov
Copy link

Hello
I thought we have same issues on Ios(client) -> linux(sever) architecture.
Issues opened on grpc/grpc-dotnet#2164 and microsoft/msquic#3703

@CarnaViire
Copy link
Member

Created microsoft/msquic#3842

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Net.Quic bug tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Projects
None yet
Development

No branches or pull requests

7 participants