Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUIC] Abort on cancellation throws QUIC_STATUS_INVALID_PARAMETER #73688

Closed
CarnaViire opened this issue Aug 10, 2022 · 22 comments · Fixed by #74634
Closed

[QUIC] Abort on cancellation throws QUIC_STATUS_INVALID_PARAMETER #73688

CarnaViire opened this issue Aug 10, 2022 · 22 comments · Fixed by #74634
Assignees
Labels
area-System.Net.Quic blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Milestone

Comments

@CarnaViire
Copy link
Member

CarnaViire commented Aug 10, 2022

Happens also in CI - see last 30 days - status on 8/25:

Day Run Details
8/24 PR 1965733 (main) net7.0-Linux-Debug-x64-CoreCLR_release-Ubuntu.1804.Amd64.Open
8/24 PR 1965170 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/24 Rolling run 1964727 (7.0) net7.0-Linux-Release-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/23 PR 1962069 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Debian.10.Amd64.Open)Ubuntu.1804.Amd64.Open
8/23 PR 1962049 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Debian.10.Amd64.Open)Ubuntu.1804.Amd64.Open
8/22 PR 1959853 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/22 PR 1958051 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/18 PR 1953122 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Debian.10.Amd64.Open)Ubuntu.1804.Amd64.Open
8/18 PR 1951538 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/18 PR 1951086 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/17 PR 1950071 (7.0-rc1) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/16 PR 1946224 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/15 PR 1943354 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/14 PR 1942884 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/12 PR 1941440 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/12 PR 1939958 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/9 PR 1931560 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/8 PR 1930321 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/6 PR 1927463 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
8/4 PR 1923922 (main) net7.0-Linux-Debug-x64-CoreCLR_release-(Alpine.314.Amd64.Open)Ubuntu.1804.Amd64.Open
7/25-8/3 Logs not available

Noticed in HTTP/3 stress tests. Occurrences in period 8/3-8/24:

21x Hits in tests (all crashes), recently 1-2 times per day in PRs

Date Branch Log
23.08 PR #73479 log
23.08 PR #74433 log
22.08 PR #74376 log
22.08 PR #74322 log
19.08 PR #74215 log
18.08 PR #74147 log
18.08 PR #74002 log
17.08 PR #74098 log
16.08 PR #73981 log
16.08 PR #73768 log
15.08 PR #67049 log
15.08 PR #73547 log
14.08 PR #73907 log
13.08 PR #73817 log
13.08 PR #73697 log
12.08 PR #73745 log
12.08 PR #73748 log
09.08 PR #72934 log
08.08 PR #73586 log
06.08 PR #73515 log
04.08 PR #73305 log

2x Hits in stress tests

1x in Run #20220803.6 on PR #72746
1x in Run #20220810.1 on main

System.AggregateException: One or more errors occurred. (One or more errors occurred. (An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER))
client_1  |  ---> System.AggregateException: One or more errors occurred. (An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER)
client_1  |  ---> System.Net.Quic.QuicException: An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER
client_1  |    at System.Net.Quic.QuicStream.Abort(QuicAbortDirection abortDirection, Int64 errorCode) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicStream.cs:line 422
client_1  |    at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 726
client_1  |    --- End of inner exception stack trace ---
client_1  |    at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 726
client_1  |    at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 726
client_1  |    --- End of inner exception stack trace ---
client_1  |    at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 726
client_1  |    at HttpStress.RequestContext.SendAsync(HttpRequestMessage request, HttpCompletionOption httpCompletion, Nullable`1 token) in /app/ClientOperations.cs:line 108
client_1  |    at HttpStress.ClientOperations.<>c.<<get_Operations>b__1_4>d.MoveNext() in /app/ClientOperations.cs:line 313
client_1  | --- End of stack trace from previous location ---
client_1  |    at HttpStress.StressClient.<>c__DisplayClass17_0.<<StartCore>g__RunWorker|0>d.MoveNext() in /app/StressClient.cs:line 224
{
    "ErrorMessage":  "An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER"
}

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
@ghost
Copy link

ghost commented Aug 10, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Occurrences:

1x in Run #20220803.6 on PR #72746
1x in Run #20220810.1 on main

System.AggregateException: One or more errors occurred. (One or more errors occurred. (An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER))
client_1  |  ---> System.AggregateException: One or more errors occurred. (An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER)
client_1  |  ---> System.Net.Quic.QuicException: An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER
client_1  |    at System.Net.Quic.QuicStream.Abort(QuicAbortDirection abortDirection, Int64 errorCode) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicStream.cs:line 422
client_1  |    at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 726
client_1  |    --- End of inner exception stack trace ---
client_1  |    at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 726
client_1  |    at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 726
client_1  |    --- End of inner exception stack trace ---
client_1  |    at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 726
client_1  |    at HttpStress.RequestContext.SendAsync(HttpRequestMessage request, HttpCompletionOption httpCompletion, Nullable`1 token) in /app/ClientOperations.cs:line 108
client_1  |    at HttpStress.ClientOperations.<>c.<<get_Operations>b__1_4>d.MoveNext() in /app/ClientOperations.cs:line 313
client_1  | --- End of stack trace from previous location ---
client_1  |    at HttpStress.StressClient.<>c__DisplayClass17_0.<<StartCore>g__RunWorker|0>d.MoveNext() in /app/StressClient.cs:line 224
Author: CarnaViire
Assignees: -
Labels:

area-System.Net.Quic

Milestone: -

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Aug 10, 2022
@karelz karelz added this to the 7.0.0 milestone Aug 11, 2022
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Aug 11, 2022
@karelz
Copy link
Member

karelz commented Aug 11, 2022

Triage: We should look into that.
@CarnaViire is running stress locally to get a repro, but couldn't reproduce it just yet. We might punt to 8.0 it if we won't be able to find anything useful.

Putting it into Low Priority bucket as we cannot reproduce it locally and it has only rare hit (1x per week so far).

@karelz karelz modified the milestones: 7.0.0, 8.0.0 Aug 12, 2022
@karelz
Copy link
Member

karelz commented Aug 12, 2022

Moving it to 8.0 as we were unable to reproduce it and make it actionable. Impact seems to be low as well.

@rzikm
Copy link
Member

rzikm commented Aug 15, 2022

Reproduced in #73817 at https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-73817-merge-3f1c7c257c234ddebf/System.Net.Quic.Functional.Tests/1/console.8d88137e.log?helixlogtype=result

Unhandled exception. System.AggregateException: One or more errors occurred. (An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER)
 ---> System.Net.Quic.QuicException: An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER
   at System.Net.Quic.ThrowHelper.ThrowIfMsQuicError(Int32 status, String message) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/Internal/ThrowHelper.cs:line 123
   at System.Net.Quic.QuicStream.Abort(QuicAbortDirection abortDirection, Int64 errorCode) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicStream.cs:line 422
   at System.Net.Quic.QuicStream.<>c.<.ctor>b__23_0(Object target) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicStream.cs:line 75
   at System.Net.Quic.ResettableValueTaskSource.<>c.<TryGetValueTask>b__12_0(Object obj, CancellationToken cancellationToken) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/Internal/ResettableValueTaskSource.cs:line 85
   at System.Threading.CancellationTokenSource.Invoke(Delegate d, Object state, CancellationTokenSource source) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 874
   at System.Threading.CancellationTokenSource.CallbackNode.ExecuteCallback() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 1104
   at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 726
   --- End of inner exception stack trace ---
   at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 726
   at System.Threading.CancellationTokenSource.TimerCallback(Object state) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 35
   at System.Threading.TimerQueueTimer.CallCallback(Boolean isThreadPool) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Timer.cs:line 703
   at System.Threading.TimerQueueTimer.Fire(Boolean isThreadPool) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Timer.cs:line 666
   at System.Threading.TimerQueue.FireNextTimers() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Timer.cs:line 330
   at System.Threading.TimerQueue.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/TimerQueue.Portable.cs:line 139
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 984
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77
   at System.Threading.Thread.StartCallback() in /_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs:line 105

My suspicion is that we may have some race which could result in calling StreamShutdown twice.

@carlossanlop

This comment was marked as duplicate.

@CarnaViire
Copy link
Member Author

I suspect that the reason is not a race but a native heap corruption, note the test suite also crashed with SIGABRT as in #72696

@hoyosjs

This comment was marked as duplicate.

@carlossanlop

This comment was marked as duplicate.

@hoyosjs hoyosjs added the Known Build Error Use this to report build issues in the .NET Helix tab label Aug 23, 2022
@build-analysis build-analysis bot removed this from the 8.0.0 milestone Aug 23, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Aug 23, 2022
@rzikm
Copy link
Member

rzikm commented Aug 23, 2022

Callstack:

(lldb) dumpstack
OS Thread Id: 0x36 (1)
TEB information is not available so a stack size of 0xFFFF is assumed
Current frame: ld-musl-x86_64.so.1!getitimer
Child-SP         RetAddr          Caller, Callee
00007FB4C481B2C0 00007ff5dc20254d ld-musl-x86_64.so.1!raise + 0x41, calling ld-musl-x86_64.so.1!__setjmp + 0x60
00007FB4C481B2F0 00007ff5dbd8ae7f libcoreclr.so!CorUnix::CSharedMemoryObject::ReleaseObjectDestructionLock(CorUnix::CPalThread*, bool) + 0x8f [/__w/1/s/src/coreclr/pal/src/objmgr/shmobject.cpp:609], calling libcoreclr.so!CorUnix::InternalLeaveCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [/__w/1/s/src/coreclr/pal/src/sync/cs.cpp:853]
00007FB4C481B310 00007ff5dc2026c2 ld-musl-x86_64.so.1 + 0xffffffff, calling ld-musl-x86_64.so.1!fetestexcept + 0x174a
00007FB4C481B360 00007ff5dc1d8f25 ld-musl-x86_64.so.1!abort + 0xe, calling ld-musl-x86_64.so.1!raise + 0x1
00007FB4C481B370 00007ff5dc202744 ld-musl-x86_64.so.1!sigaction + 0x7a, calling ld-musl-x86_64.so.1!__setjmp + 0x60
00007FB4C481B3A0 00007ff5dbda595b libcoreclr.so + 0xffffffff [/__w/1/s/src/coreclr/pal/src/thread/process.cpp:2441], calling libcoreclr.so!abort
00007FB4C481B3C0 00007ff5dbda5890 libcoreclr.so!TerminateProcess [/__w/1/s/src/coreclr/pal/src/thread/process.cpp:1233], calling libcoreclr.so!PROCAbort [/__w/1/s/src/coreclr/pal/src/thread/process.cpp:2431]
00007FB4C481B3F0 00007ff5dbb75b55 libcoreclr.so!UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT*) + 0x425 [/__w/1/s/src/coreclr/vm/exceptionhandling.cpp:4680], calling libcoreclr.so!CrashDumpAndTerminateProcess(unsigned int) [/__w/1/s/src/coreclr/vm/excep.cpp:4208]
00007FB4C481B410 00007ff5dc01109f libgcc_s.so.1!___lldb_unnamed_symbol58$$libgcc_s.so.1 + 0x9f
00007FB4C481B4F0 00007ff55e1c9adb (MethodDesc 00007ff55d97fa80 + 0x1db System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean)), calling libcoreclr.so!IL_Throw(Object*) [/__w/1/s/src/coreclr/vm/jithelpers.cpp:3984]
00007FB4C481B5B0 00007ff5db993fac libcoreclr.so!RaiseTheExceptionInternalOnly(Object*, int, int) + 0x4bc [/__w/1/s/src/coreclr/vm/excep.cpp:2810], calling libcoreclr.so!_Unwind_Resume
00007FB4C481B820 00007ff5dbacb01d libcoreclr.so!IL_Throw(Object*) + 0x13d [/__w/1/s/src/coreclr/vm/jithelpers.cpp:0], calling libcoreclr.so!RaiseTheExceptionInternalOnly(Object*, int, int) [/__w/1/s/src/coreclr/vm/excep.cpp:2669]
00007FB4C481B8C0 00007ff5dc1dfc3e ld-musl-x86_64.so.1 + 0xffffffff, calling ld-musl-x86_64.so.1 + 0xffffffff
00007FB4C481B938 00007ff55d24e8d6 (MethodDesc 00007ff55ce13e70 + 0x36 System.Threading.Thread.StartCallback()), calling 00007ff55df70c60
00007FB4C481B978 00007ff55d24e8d6 (MethodDesc 00007ff55ce13e70 + 0x36 System.Threading.Thread.StartCallback()), calling 00007ff55df70c60
00007FB4C481B9D0 00007ff55d24e8d6 (MethodDesc 00007ff55ce13e70 + 0x36 System.Threading.Thread.StartCallback()), calling 00007ff55df70c60
00007FB4C481BA10 00007ff5dbb75bf3 libcoreclr.so!DispatchManagedException(PAL_SEHException&, bool) + 0x43 [/__w/1/s/src/coreclr/vm/exceptionhandling.cpp:0], calling libcoreclr.so!UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT*) [/__w/1/s/src/coreclr/vm/exceptionhandling.cpp:4541]
00007FB4C481BA30 00007ff5dc01109f libgcc_s.so.1!___lldb_unnamed_symbol58$$libgcc_s.so.1 + 0x9f
00007FB4C481BBD0 00007ff5db993fac libcoreclr.so!RaiseTheExceptionInternalOnly(Object*, int, int) + 0x4bc [/__w/1/s/src/coreclr/vm/excep.cpp:2810], calling libcoreclr.so!_Unwind_Resume
00007FB4C481BE40 00007ff5dbacb01d libcoreclr.so!IL_Throw(Object*) + 0x13d [/__w/1/s/src/coreclr/vm/jithelpers.cpp:0], calling libcoreclr.so!RaiseTheExceptionInternalOnly(Object*, int, int) [/__w/1/s/src/coreclr/vm/excep.cpp:2669]
00007FB4C481BEE0 00007ff5dc1dfc3e ld-musl-x86_64.so.1 + 0xffffffff, calling ld-musl-x86_64.so.1 + 0xffffffff
00007FB4C481BF10 00007ff5dbacb0fe libcoreclr.so!IL_Throw(Object*) + 0x21e [/__w/1/s/src/coreclr/vm/jithelpers.cpp:0], calling libcoreclr.so!DispatchManagedException(PAL_SEHException&, bool) [/__w/1/s/src/coreclr/vm/exceptionhandling.cpp:4768]
00007FB4C481BF50 00007ff55e1c9adb (MethodDesc 00007ff55d97fa80 + 0x1db System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean)), calling libcoreclr.so!IL_Throw(Object*) [/__w/1/s/src/coreclr/vm/jithelpers.cpp:3984]
00007FB4C481BFF8 00007ff5dbacaf3a libcoreclr.so!IL_Throw(Object*) + 0x5a [/__w/1/s/src/coreclr/pal/inc/pal.h:4681], calling libcoreclr.so!LazyMachStateCaptureState [/__w/1/s/src/coreclr/vm/amd64/getstate.S:28]
00007FB4C481C090 00007ff55e1c9adb (MethodDesc 00007ff55d97fa80 + 0x1db System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean)), calling libcoreclr.so!IL_Throw(Object*) [/__w/1/s/src/coreclr/vm/jithelpers.cpp:3984]
00007FB4C481C0F0 00007ff55e2ab15a (MethodDesc 00007ff55d97f8b8 + 0x3a System.Threading.CancellationTokenSource.TimerCallback(System.Object)), calling (MethodDesc 00007ff55d97fa68 + 0 System.Threading.CancellationTokenSource.NotifyCancellation(Boolean))

It looks like throwing from the ResettableValueTaskSource.CancellationAction (and thus inside lambda passed in CancellationToken.UnsafeRegister) crashes the process. So one step would be swallowing the exception.

However, there still is a data race underneath which causes the exception to be thrown in the first case.

@karelz karelz added the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Aug 25, 2022
@eerhardt

This comment was marked as duplicate.

@rzikm
Copy link
Member

rzikm commented Aug 26, 2022

The issue is race when using e.g. CancellationTokenSource.CancelAfter, it can be reproduced with an added help:

diff --git a/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicStream.cs b/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicStream.cs
index c11f3029a21..da5a9554e44 100644
--- a/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicStream.cs
+++ b/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicStream.cs
@@ -417,6 +417,9 @@ public void Abort(QuicAbortDirection abortDirection, long errorCode)
             return;
         }

+        System.Console.WriteLine("Waiting in Abort()");
+        System.Console.ReadLine();
+
         unsafe
         {
             ThrowHelper.ThrowIfMsQuicError(MsQuicApi.Api.ApiTable->StreamShutdown(

And following test

        [Fact]
        public async Task Test()
        {
            QuicListenerOptions listenerOptions = new QuicListenerOptions()
            {
                ListenEndPoint = new IPEndPoint(IPAddress.Loopback, 0),
                ApplicationProtocols = new List<SslApplicationProtocol>() { ApplicationProtocol },
                ConnectionOptionsCallback = (_, _, _) =>
                {
                    var serverOptions = CreateQuicServerOptions();
                    serverOptions.MaxInboundBidirectionalStreams = 1;
                    serverOptions.MaxInboundUnidirectionalStreams = 1;
                    serverOptions.IdleTimeout = TimeSpan.FromSeconds(1);
                    return ValueTask.FromResult(serverOptions);
                }
            };
            (QuicConnection clientConnection, QuicConnection serverConnection) = await CreateConnectedQuicConnection(null, listenerOptions);

            await using (clientConnection)
            await using (serverConnection)
            {
                using QuicStream clientStream = await clientConnection.OpenOutboundStreamAsync(QuicStreamType.Bidirectional);
                await clientStream.WriteAsync(new byte[1]);
                using QuicStream serverStream = await serverConnection.AcceptInboundStreamAsync().AsTask().WaitAsync(TimeSpan.FromSeconds(10));
                await serverStream.ReadAsync(new byte[1]);

                CancellationTokenSource cts = new CancellationTokenSource();
                cts.CancelAfter(TimeSpan.FromSeconds(1));
                await Assert.ThrowsAnyAsync<OperationCanceledException>(() => clientStream.ReadAsync(new byte[1], cts.Token).AsTask());
                await clientStream.DisposeAsync();

                System.Console.WriteLine("Disposed");
                System.Console.ReadLine();
            }
        }

This produces:

❯ ..\..\..\testhost\net7.0-windows-Debug-x64\dotnet.exe exec --runtimeconfig .\System.Net.Quic.Functional.Tests.runtimeconfig.json --depsfile .\System.Net.Quic.Functional.Tests.deps.json .\xunit.console.dll .\System.Net.Quic.Functional.Tests.dll -notrait category=failing -method *MsQuicTests.Test
Microsoft.DotNet.XUnitConsoleRunner v2.5.0 (64-bit .NET 8.0.0-dev)
  Discovering: System.Net.Quic.Functional.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.Quic.Functional.Tests (found 1 of 111 test case)
  Starting:    System.Net.Quic.Functional.Tests (parallel test collections = on, max threads = 20)
Waiting in Abort()
Disposed

Unhandled exception. System.AggregateException: One or more errors occurred. (An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER)
...

While #74611 would help a bit by not passing invalid handle to MsQuic, we would get ObjectDisposedException instead. I think at this point the best course of action is to simply swallow that exception, cause we can't prevent the race without disposing inside a lock.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 26, 2022
@CarnaViire
Copy link
Member Author

What I don't like about the current cancellation pattern, is that we first finish the task with OCE and only later call abort. If this would not be the case (abort called before finishing the task), then you would not get this race in the code from your example:

await Assert.ThrowsAnyAsync<OperationCanceledException>(() => clientStream.ReadAsync(new byte[1], cts.Token).AsTask());
await clientStream.DisposeAsync();

Because abort would be already done when you call DisposeAsync.

The example was about reads, but race with cancelling writes might be even more tricky, because Dispose might be fast enough to gracefully close the writing side before abort fires.

@rzikm
Copy link
Member

rzikm commented Aug 26, 2022

Doing it in the reverse order would expose us to a different race (set OCE vs. set real result). We would have to change the implementation of ResettableValueTaskSource and move the call to CancellationAction inside the lock in TryComplete, thoughts, @ManickaP?

@CarnaViire
Copy link
Member Author

CarnaViire commented Aug 26, 2022

Yep, I know, just reversing the order is not enough, the change would be extremely tricky, there's also a thing that we don't want to lose OCE, and just calling Abort would result in a different exception being set. That's why I'm not suggesting changing that now, merely highlighting that I believe there's a problem and we might want to explore potential solutions.

@ManickaP
Copy link
Member

ManickaP commented Aug 26, 2022

You cannot abort before setting OCE, you could end up with a different exception than expected OCE, breaking the stream conformance tests thus an expected stream behavior. And as far as I understand we didn't abort at all in 6.0.
Also we have an open issue to consider making cancellation "soft": #72607 so we can discuss cancellation behavior there.

@ManickaP
Copy link
Member

We would have to change the implementation of ResettableValueTaskSource and move the call to CancellationAction inside the lock in TryComplete, thoughts, @ManickaP?

That could work, it would be ugly as hell though 🤣

@rzikm
Copy link
Member

rzikm commented Aug 26, 2022

You cannot abort before setting OCE, you could end up with a different exception than expected OCE

That is something that is possible to fix (e.g. having internal overload of Abort which takes an exception to use) 😈

@ManickaP
Copy link
Member

The thing with setting up oce and than aborting is that you set up 2 exceptions in a row, since you want to return OCE first and for all following calls to return StreamAborted (or whatever). And that's a thing that ResettableValueTaskSource is made to do. So moving this logic outside defeats the purpose and it can all be redone in a different way.

@carlossanlop

This comment was marked as duplicate.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 7, 2022
@rzikm
Copy link
Member

rzikm commented Sep 7, 2022

Reopening for servicing 7.0

@rzikm rzikm reopened this Sep 7, 2022
@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Sep 7, 2022
@karelz
Copy link
Member

karelz commented Sep 8, 2022

Fixed in 8.0 (main) in PR #74634 and in 7.0 (RC2) in PR #75179

@karelz karelz closed this as completed Sep 8, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Oct 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Quic blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Projects
None yet
7 participants