-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUIC] Abort on cancellation throws QUIC_STATUS_INVALID_PARAMETER #73688
Comments
Tagging subscribers to this area: @dotnet/ncl Issue DetailsOccurrences: 1x in Run #20220803.6 on PR #72746
|
Triage: We should look into that. Putting it into Low Priority bucket as we cannot reproduce it locally and it has only rare hit (1x per week so far). |
Moving it to 8.0 as we were unable to reproduce it and make it actionable. Impact seems to be low as well. |
Reproduced in #73817 at https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-73817-merge-3f1c7c257c234ddebf/System.Net.Quic.Functional.Tests/1/console.8d88137e.log?helixlogtype=result
My suspicion is that we may have some race which could result in calling |
This comment was marked as duplicate.
This comment was marked as duplicate.
I suspect that the reason is not a race but a native heap corruption, note the test suite also crashed with SIGABRT as in #72696 |
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
Callstack:
It looks like throwing from the runtime/src/libraries/System.Net.Quic/src/System/Net/Quic/Internal/ResettableValueTaskSource.cs Line 85 in ad366f2
However, there still is a data race underneath which causes the exception to be thrown in the first case. |
This comment was marked as duplicate.
This comment was marked as duplicate.
The issue is race when using e.g. diff --git a/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicStream.cs b/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicStream.cs
index c11f3029a21..da5a9554e44 100644
--- a/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicStream.cs
+++ b/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicStream.cs
@@ -417,6 +417,9 @@ public void Abort(QuicAbortDirection abortDirection, long errorCode)
return;
}
+ System.Console.WriteLine("Waiting in Abort()");
+ System.Console.ReadLine();
+
unsafe
{
ThrowHelper.ThrowIfMsQuicError(MsQuicApi.Api.ApiTable->StreamShutdown( And following test [Fact]
public async Task Test()
{
QuicListenerOptions listenerOptions = new QuicListenerOptions()
{
ListenEndPoint = new IPEndPoint(IPAddress.Loopback, 0),
ApplicationProtocols = new List<SslApplicationProtocol>() { ApplicationProtocol },
ConnectionOptionsCallback = (_, _, _) =>
{
var serverOptions = CreateQuicServerOptions();
serverOptions.MaxInboundBidirectionalStreams = 1;
serverOptions.MaxInboundUnidirectionalStreams = 1;
serverOptions.IdleTimeout = TimeSpan.FromSeconds(1);
return ValueTask.FromResult(serverOptions);
}
};
(QuicConnection clientConnection, QuicConnection serverConnection) = await CreateConnectedQuicConnection(null, listenerOptions);
await using (clientConnection)
await using (serverConnection)
{
using QuicStream clientStream = await clientConnection.OpenOutboundStreamAsync(QuicStreamType.Bidirectional);
await clientStream.WriteAsync(new byte[1]);
using QuicStream serverStream = await serverConnection.AcceptInboundStreamAsync().AsTask().WaitAsync(TimeSpan.FromSeconds(10));
await serverStream.ReadAsync(new byte[1]);
CancellationTokenSource cts = new CancellationTokenSource();
cts.CancelAfter(TimeSpan.FromSeconds(1));
await Assert.ThrowsAnyAsync<OperationCanceledException>(() => clientStream.ReadAsync(new byte[1], cts.Token).AsTask());
await clientStream.DisposeAsync();
System.Console.WriteLine("Disposed");
System.Console.ReadLine();
}
} This produces:
While #74611 would help a bit by not passing invalid handle to MsQuic, we would get |
What I don't like about the current cancellation pattern, is that we first finish the task with OCE and only later call abort. If this would not be the case (abort called before finishing the task), then you would not get this race in the code from your example: await Assert.ThrowsAnyAsync<OperationCanceledException>(() => clientStream.ReadAsync(new byte[1], cts.Token).AsTask());
await clientStream.DisposeAsync(); Because abort would be already done when you call DisposeAsync. The example was about reads, but race with cancelling writes might be even more tricky, because Dispose might be fast enough to gracefully close the writing side before abort fires. |
Doing it in the reverse order would expose us to a different race (set OCE vs. set real result). We would have to change the implementation of |
Yep, I know, just reversing the order is not enough, the change would be extremely tricky, there's also a thing that we don't want to lose OCE, and just calling Abort would result in a different exception being set. That's why I'm not suggesting changing that now, merely highlighting that I believe there's a problem and we might want to explore potential solutions. |
You cannot abort before setting OCE, you could end up with a different exception than expected OCE, breaking the stream conformance tests thus an expected stream behavior. And as far as I understand we didn't abort at all in 6.0. |
That could work, it would be ugly as hell though 🤣 |
That is something that is possible to fix (e.g. having internal overload of Abort which takes an exception to use) 😈 |
The thing with setting up oce and than aborting is that you set up 2 exceptions in a row, since you want to return OCE first and for all following calls to return StreamAborted (or whatever). And that's a thing that ResettableValueTaskSource is made to do. So moving this logic outside defeats the purpose and it can all be redone in a different way. |
This comment was marked as duplicate.
This comment was marked as duplicate.
Reopening for servicing 7.0 |
Happens also in CI - see last 30 days - status on 8/25:
Noticed in HTTP/3 stress tests. Occurrences in period 8/3-8/24:
21x Hits in tests (all crashes), recently 1-2 times per day in PRs
2x Hits in stress tests
1x in Run #20220803.6 on PR #72746
1x in Run #20220810.1 on main
Report
Summary
The text was updated successfully, but these errors were encountered: