Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HTTP/3] Re-add QuicStream.ShutdownWriteCompleted #56939

Closed
JamesNK opened this issue Aug 5, 2021 · 4 comments
Closed

[HTTP/3] Re-add QuicStream.ShutdownWriteCompleted #56939

JamesNK opened this issue Aug 5, 2021 · 4 comments

Comments

@JamesNK
Copy link
Member

JamesNK commented Aug 5, 2021

I think Kestrel needs QuicStream.ShutdownWriteCompleted to successfully handle a graceful server shutdown.

In a graceful server shutdown:

  1. The server will send a GOAWAY to the client
  2. The server will then wait some time for in-progress requests to complete
  3. Once requests are all complete then the server closes the connection.

The problem is there is a race between completing the final requests and closing the connection. We need to know that the data sent with the final requests was successfully received BEFORE we call QuicConnection.CloseAsync.

If response data is still inflight to the client, and CloseAsync is called, the close notification could arrive at the client before the data. The close notification will abort all streams and the client gets the error "Connection aborted error (256)" instead of a successfully completed request.

QuicStream.ShutdownWriteCompleted would allow the server to wait until it knows the client has received data before calling QuicConnection.CloseAsync. ShutdownCompleted is not appropriate because we don't want to consider streams alive until the client also calls complete.

Basically, please revert #55981

I have a test that reproduces this scenario. Most of the time the draining request completes successfully, but 10% of the time there is a "Connection aborted error (256)" error.

@ghost
Copy link

ghost commented Aug 5, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

I think Kestrel needs QuicStream.ShutdownWriteCompleted to successfully handle a graceful server shutdown.

In a graceful server shutdown:

  1. The server will send a GOAWAY to the client
  2. The server will then wait some time for in-progress requests to complete
  3. Once requests are all complete then the server closes the connection.

The problem is there is a race between completing the final requests and closing the connection. We need to know that the data sent with the final requests was successfully received BEFORE we call QuicConnection.CloseAsync.

If response data is still inflight to the client, and CloseAsync is called, the close notification could arrive at the client before the data. The close notification will abort all streams and the client gets the error "Connection aborted error (256)" instead of a successfully completed request.

I have a test that reproduces this scenario. Most of the time the draining request completes successfully, but 10% of the time there is a "Connection aborted error (256)" error.

Author: JamesNK
Assignees: -
Labels:

area-System.Net.Http

Milestone: -

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Aug 5, 2021
@geoffkizer
Copy link
Contributor

ShutdownCompleted is not appropriate because we don't want to consider streams alive until the client also calls complete.

I'm not sure what you mean here by "the client also calls complete", can you clarify?

Assuming that you are completing or aborting the read on the server side, then ShutdownCompleted is not waiting for the client to complete anything, because the read is already completed and/or aborted.

I have a test that reproduces this scenario. Most of the time the draining request completes successfully, but 10% of the time there is a "Connection aborted error (256)" error.

Can you post or link the code?

@JamesNK
Copy link
Member Author

JamesNK commented Aug 6, 2021

PR: dotnet/aspnetcore#34968
Test: ConnectionLifetimeNotificationFeature_RequestClose_ConnectionEnds

Interop.FunctionalTests.Http3.Http3RequestTests.ConnectionLifetimeNotificationFeature_RequestClose_ConnectionEnds
   Source: Http3RequestTests.cs line 768
   Duration: 44 ms

  Message: 
System.Net.Http.HttpRequestException : An error occurred while sending the request.
---- System.Net.Quic.QuicConnectionAbortedException : Connection aborted by peer (256).

  Stack Trace: 
Http3RequestStream.SendAsync(CancellationToken cancellationToken)
Http3RequestStream.SendAsync(CancellationToken cancellationToken)
Http3Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
HttpConnectionPool.TrySendUsingHttp3Async(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
HttpConnectionPool.DetermineVersionAndSendAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
HttpConnectionPool.SendAndProcessAltSvcAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
TaskExtensions.TimeoutAfter[T](Task`1 task, TimeSpan timeout, String filePath, Int32 lineNumber) line 90
Http3RequestTests.ConnectionLifetimeNotificationFeature_RequestClose_ConnectionEnds() line 833
--- End of stack trace from previous location ---
----- Inner Stack Trace -----
MsQuicConnection.AcceptStreamAsync(CancellationToken cancellationToken)
QuicConnection.AcceptStreamAsync(CancellationToken cancellationToken)
Http3Connection.AcceptStreamsAsync()

  Standard Output: 
| [0.001s] TestLifetime Information: Starting test ConnectionLifetimeNotificationFeature_RequestClose_ConnectionEnds at 2021-08-06T00:24:41
| [0.004s] Microsoft.Extensions.Hosting.Internal.Host Debug: Hosting starting
| [0.014s] Microsoft.AspNetCore.Server.Kestrel.Core.KestrelServer Debug: Using development certificate: CN=localhost (Thumbprint: 6DBDDAFB15AFE297616EA158AE0D9C4348B7AE70)
| [0.020s] Microsoft.Hosting.Lifetime Information: Now listening on: https://127.0.0.1:54682
| [0.021s] Microsoft.AspNetCore.Hosting.Diagnostics Debug: Loaded hosting startup assembly Interop.FunctionalTests
| [0.021s] Microsoft.Hosting.Lifetime Information: Application started. Press Ctrl+C to shut down.
| [0.021s] Microsoft.Hosting.Lifetime Information: Hosting environment: Production
| [0.021s] Microsoft.Hosting.Lifetime Information: Content root path: C:\Development\Source\AspNetCore\artifacts\bin\Interop.FunctionalTests\Debug\net6.0\
| [0.021s] Microsoft.Extensions.Hosting.Internal.Host Debug: Hosting started
| [0.023s] Microsoft.AspNetCore.Server.Kestrel.Transport.Quic Debug: Connection id "0HMAOE29TS2BA" accepted.
| [0.023s] Microsoft.AspNetCore.Server.Kestrel.Connections Debug: Connection id "0HMAOE29TS2BA" accepted.
| [0.023s] Microsoft.AspNetCore.Server.Kestrel.Connections Debug: Connection id "0HMAOE29TS2BA" started.
| [0.027s] Microsoft.AspNetCore.Server.Kestrel.Transport.Quic Debug: Stream id "0HMAOE29TS2BA:00000003" type Unidirectional connected.
| [0.033s] Microsoft.AspNetCore.Server.Kestrel.Transport.Quic Debug: Stream id "0HMAOE29TS2BA:00000002" type Unidirectional accepted.
| [0.033s] Microsoft.AspNetCore.Server.Kestrel.Http3 Trace: Connection id "0HMAOE29TS2BA" received SETTINGS frame for stream ID 2 with length 5.
| [0.033s] Microsoft.AspNetCore.Server.Kestrel.Transport.Quic Debug: Stream id "0HMAOE29TS2BA:00000000" type Bidirectional accepted.
| [0.033s] Microsoft.AspNetCore.Server.Kestrel.Http3 Trace: Connection id "0HMAOE29TS2BA" received HEADERS frame for stream ID 0 with length 51.
| [0.033s] Microsoft.AspNetCore.Hosting.Diagnostics Information: Request starting HTTP/3 GET https://127.0.0.1:54682/1 - -
| [0.035s] Microsoft.AspNetCore.Server.Kestrel.Http3 Debug: Connection id "0HMAOE29TS2BA" is closing.
| [0.035s] Microsoft.AspNetCore.Server.Kestrel.Http3 Debug: Connection id "0HMAOE29TS2BA": Highest opened stream ID 4611686018427387903 in GOAWAY.
| [0.035s] Microsoft.AspNetCore.Server.Kestrel.Http3 Trace: Connection id "0HMAOE29TS2BA" sending GOAWAY frame for stream ID 3 with length 8.
| [0.036s] Interop.FunctionalTests.Http3.Http3RequestTests Information: Started waiting for logs: Check for initial GOAWAY frame sent on server initiated shutdown.
| [0.036s] Interop.FunctionalTests.Http3.Http3RequestTests Information: Successfully received logs: Check for initial GOAWAY frame sent on server initiated shutdown.
| [0.036s] Microsoft.AspNetCore.Server.Kestrel.Http3 Trace: Connection id "0HMAOE29TS2BA" sending HEADERS frame for stream ID 0 with length 71.
| [0.037s] Microsoft.AspNetCore.Hosting.Diagnostics Information: Request finished HTTP/3 GET https://127.0.0.1:54682/1 - - - 200 0 - 3.1735ms
| [0.037s] Microsoft.AspNetCore.Server.Kestrel.Transport.Quic Debug: Stream id "0HMAOE29TS2BA:00000000" shutting down writes because: "The QUIC transport's send loop completed gracefully.".
| [0.037s] Microsoft.AspNetCore.Server.Kestrel Debug: Connection id "0HMAOE29TS2BA", Request id "0HMAOE29TS2BA:00000000": started reading request body.
| [0.037s] Microsoft.AspNetCore.Server.Kestrel Debug: Connection id "0HMAOE29TS2BA", Request id "0HMAOE29TS2BA:00000000": done reading request body.
| [0.037s] Microsoft.AspNetCore.Server.Kestrel.Http3 Debug: Connection id "0HMAOE29TS2BA": Highest opened stream ID 0 in GOAWAY.
| [0.037s] Microsoft.AspNetCore.Server.Kestrel.Http3 Trace: Connection id "0HMAOE29TS2BA" sending GOAWAY frame for stream ID 3 with length 1.
| [0.037s] Microsoft.AspNetCore.Server.Kestrel.Http3 Debug: Trace id "0HMAOE29TS2BA": HTTP/3 stream error "H3_NO_ERROR". An abort is being sent to the stream.
| Microsoft.AspNetCore.Connections.ConnectionAbortedException: The HTTP/3 connection faulted.
| [0.037s] Microsoft.AspNetCore.Server.Kestrel.Transport.Quic Debug: Stream id "0HMAOE29TS2BA:00000002" aborted by application with error code 256 because: "The HTTP/3 connection faulted.".
| [0.038s] Microsoft.AspNetCore.Server.Kestrel.Transport.Quic Debug: Stream id "0HMAOE29TS2BA:00000003" shutting down writes because: "The QUIC transport's send loop completed gracefully.".
| [0.039s] Microsoft.AspNetCore.Server.Kestrel.Transport.Quic Debug: Connection id "0HMAOE29TS2BA" aborted by application with error code 256 because: "The HTTP/3 connection faulted.".
| [0.039s] Microsoft.AspNetCore.Server.Kestrel.Http3 Debug: Connection id "0HMAOE29TS2BA" is closed. The last processed stream ID was 0.
| [0.039s] Microsoft.AspNetCore.Server.Kestrel.Connections Debug: Connection id "0HMAOE29TS2BA" stopped.
| [0.044s] Microsoft.AspNetCore.Server.Kestrel.Transport.Quic Debug: Listener has aborted with exception: Operation aborted.
| [0.047s] Interop.FunctionalTests.Http3.Http3RequestTests Error: Test threw an exception.
| System.Net.Http.HttpRequestException: An error occurred while sending the request.
|  ---> System.Net.Quic.QuicConnectionAbortedException: Connection aborted by peer (256).
|    at System.Net.Quic.Implementations.MsQuic.MsQuicConnection.AcceptStreamAsync(CancellationToken cancellationToken)
|    at System.Net.Quic.QuicConnection.AcceptStreamAsync(CancellationToken cancellationToken)
|    at System.Net.Http.Http3Connection.AcceptStreamsAsync()
|    --- End of inner exception stack trace ---
|    at System.Net.Http.Http3RequestStream.SendAsync(CancellationToken cancellationToken)
|    at System.Net.Http.Http3RequestStream.SendAsync(CancellationToken cancellationToken)
|    at System.Net.Http.Http3Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
|    at System.Net.Http.HttpConnectionPool.TrySendUsingHttp3Async(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
|    at System.Net.Http.HttpConnectionPool.DetermineVersionAndSendAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
|    at System.Net.Http.HttpConnectionPool.SendAndProcessAltSvcAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
|    at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
|    at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
|    at Microsoft.AspNetCore.Testing.TaskExtensions.TimeoutAfter[T](Task`1 task, TimeSpan timeout, String filePath, Int32 lineNumber) in C:\Development\Source\AspNetCore\src\Shared\TaskExtensions.cs:line 90
|    at Interop.FunctionalTests.Http3.Http3RequestTests.ConnectionLifetimeNotificationFeature_RequestClose_ConnectionEnds() in C:\Development\Source\AspNetCore\src\Servers\Kestrel\test\Interop.FunctionalTests\Http3\Http3RequestTests.cs:line 833
|    at Xunit.Sdk.TestInvoker`1.<>c__DisplayClass48_1.<<InvokeTestMethodAsync>b__1>d.MoveNext() in C:\Dev\xunit\xunit\src\xunit.execution\Sdk\Frameworks\Runners\TestInvoker.cs:line 264
| --- End of stack trace from previous location ---
|    at Xunit.Sdk.ExecutionTimer.AggregateAsync(Func`1 asyncAction) in C:\Dev\xunit\xunit\src\xunit.execution\Sdk\Frameworks\ExecutionTimer.cs:line 48
|    at Xunit.Sdk.ExceptionAggregator.RunAsync(Func`1 code) in C:\Dev\xunit\xunit\src\xunit.core\Sdk\ExceptionAggregator.cs:line 90
| [0.054s] TestLifetime Information: Finished test ConnectionLifetimeNotificationFeature_RequestClose_ConnectionEnds in 0.053344s

@JamesNK
Copy link
Member Author

JamesNK commented Aug 6, 2021

Had a discussion with @geoffkizer about this and I'm going to change Kestrel to rely on the client to complete a graceful close. There is a timer on host shutdown that will forcefully terminate the server after 5 seconds.

Closing for now.

@JamesNK JamesNK closed this as completed Aug 6, 2021
@karelz karelz added this to the 6.0.0 milestone Aug 17, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Sep 16, 2021
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants