Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUIC] Design of Shutdown for QuicStreams in System.Net.Quic #756

Closed
jkotalik opened this issue Dec 11, 2019 · 55 comments
Closed

[QUIC] Design of Shutdown for QuicStreams in System.Net.Quic #756

jkotalik opened this issue Dec 11, 2019 · 55 comments
Assignees
Labels
area-System.Net.Quic enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@jkotalik
Copy link
Contributor

Problem

So today, we have an API for Quic which looks like the following:

    public sealed class QuicStream : System.IO.Stream
    {
        internal QuicStream() { }
        public override bool CanSeek => throw null;
        public override long Length => throw null;
        public override long Seek(long offset, System.IO.SeekOrigin origin) => throw null;
        public override void SetLength(long value) => throw null;
        public override long Position { get => throw null; set => throw null; }
        public override bool CanRead => throw null;
        public override bool CanWrite => throw null;
        public override void Flush() => throw null;
        public override int Read(byte[] buffer, int offset, int count) => throw null;
        public override void Write(byte[] buffer, int offset, int count) => throw null;
        public long StreamId => throw null;
        public void AbortRead() => throw null;
        public ValueTask ShutdownWriteAsync(System.Threading.CancellationToken cancellationToken = default) => throw null;
    }

Today, QuicStream derives from Stream, which is really nice for interop purposes with other things, allowing people to just read and write to and from the stream without thinking what the underlying implementation is.

When calling Dispose on the stream, that currently triggers an abortive shutdown sequence on the stream rather than graceful. To trigger a graceful shutdown, one needs to call ShutdownWriteAsync, which sends a FIN to the peer and waits for an ACK. However, people that would normally be using a stream wouldn't have access to ShutdownWriteAsync, hence all actions would be abortive by default.

Current thoughts

Ideally for the workflow we'd like to enable is something like:

using (Stream stream = connection.AcceptAsync()) // this is actually a quic stream
{
    Memory<byte> memory = new Memory<byte>(new byte[10]);
    int res = await stream.ReadAsync(memory);
    await stream.WriteAsync(memory);
}

The big question here is how do we make it so we can send a shutdown for a final write with a stream API outside of Dispose?

The first pivot would be whether we'd like QuicStream to derive from stream at all. If we'd like it to derive from System.IO.Stream, I think it would have to make Dispose/DisposeAsync do a graceful shutdown by default in that case, and expose Abortive versions of these functions on the QuicStream itself. That way, in the default case, using the stream APIs "just works". However, doing a graceful shutdown of a stream in Dispose has downsides. We would be doing a network call in Dispose, which would block. DisposeAsync also can't be canceled as well.

So let's talk about a world where QuicStream didn't derive from System.IO.Stream and we had a few liberties with the API. Potentially, we could change a few things:

  • Make WriteAsync take a flag which says if this is the final write.
  • Allow Dispose to be abortive.

Abortive Dispose wouldn't block at all, which is nice. However, we do need to be careful about how to supply an error code when aborting a stream. The error code that is used for abort is supposed to be "application specific and provided", meaning we'd need a default error code.

cc @dotnet/http3 and @nibanks I haven't really reached a conclusion based on this yet. I'd be interested in your thoughts.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.IO untriaged New issue has not been triaged by the area owner labels Dec 11, 2019
@jkotalik jkotalik added this to the 5.0 milestone Dec 11, 2019
@scalablecory
Copy link
Contributor

scalablecory commented Dec 11, 2019

Q: an abortive close also needs I/O, doesn't it? Is it managed behind the scenes, or does it also need to be async?

If we implement Stream -- and we probably should, because so many things can use it -- then Dispose and DisposeAsync should be graceful to match the expected behavior of Stream. We just need to live with that extra I/O, I think :/.

Lets look at NetworkStream.Dispose:

  • It performs a graceful shutdown, so your sample code works great.
  • It doesn't block, because of OS magic, unless you set linger options.
  • If you're calling it during an exception-triggered unwind and your protocol doesn't have ways to distinguish a graceful shutdown from a busted protocol, you're going to have a bad time.
    • You need to set ownsSocket=false and manage shutdown yourself to get around this.

Spitballing... if we really want to keep QuicStream deriving from Stream, what we could do is enable an 'advanced' behavior like so:

using QuicStream stream = CreateUnidirectionalStream(abortive = true);

// if exception happens, we don't ever Shutdown() and so Dispose will be abortive.
DoSomethingThatCanThrow();

stream.Shutdown(SD_BOTH); // graceful shutdown, now Dispose is locked in to no-op.

@jkotalik
Copy link
Contributor Author

an abortive close also needs I/O, doesn't it? Is it managed behind the scenes, or does it also need to be async?

It does IO, but you don't need to await as it's abortive and it's managed behind the scenes.

We just need to live with that extra I/O, I think :/.

So there could be another option which could make it so we don't synchronously block in the Dispose call. Let's say Dispose didn't actually complete the graceful shutdown and kept most of the state, and rather just started it. On the callback to the graceful shutdown completing, it would finish the callback and call stream close.

I really don't like this option because it could lose writes if the connection is closed immediately after calling Dispose. Closing the connection doesn't wait for all streams to gracefully close as well.

Spitballing... if we really want to keep QuicStream deriving from Stream, what we could do is enable an 'advanced' behavior like so:

My feeling is no matter what we'd have to expose another way to do async graceful shutdown on QuicStream. We can track the current state of QuicStream to check whether to noop in Dispose or not.

@nibanks
Copy link

nibanks commented Dec 11, 2019

IMO, the most important thing that needs to be ironed out is the dispose path here. Lots of focus seems to be on the write path, whether it should be graceful or abortive, but not much on the read path.

A very important point with QUIC is that all aborts (in the read or write direction) must have an application layer error code. Therefore they cannot happen in response to a simple Dispose call, since it takes no error code as argument. If this must is ever violated (say, by closing the stream without explicitly aborting first, or waiting on the graceful close to complete) then the entire connection must be immediately torn down.

Therefore the API must be designed so that it should be very hard to accidentally code up something that either doesn't abort both directions or doesn't gracefully complete them. I don't have a good solution to this that fits inside the Stream interface.

@jkotalik
Copy link
Contributor Author

A very important point with QUIC is that all aborts (in the read or write direction) must have an application layer error code. Therefore they cannot happen in response to a simple Dispose call, since it takes no error code as argument. If this must is ever violated (say, by closing the stream without explicitly aborting first, or waiting on the graceful close to complete) then the entire connection must be immediately torn down.

Yeah, even in the case where Dispose is graceful by default, there are still conditions where Dispose may be in an abortive state (timeout, didn't read the entire read side, etc.). We'd probably need to supply a default error code that is application specific in that case: maybe as a options when creating a QuicStream on the client or creating the Listener on the Server.

@geoffkizer
Copy link
Contributor

The first pivot would be whether we'd like QuicStream to derive from stream at all.

We want QuicStream to derive from Stream. There's a ton of code that works in terms of Stream, and user understand how Streams work.

But even if we decided that QuicStream didn't actually derive from Stream, we would just turn around and write a wrapper over QuicStream that implements Stream, which would induce all of these questions anyway. In other words, not deriving from Stream doesn't actually solve anything here.

@geoffkizer
Copy link
Contributor

A very important point with QUIC is that all aborts (in the read or write direction) must have an application layer error code. Therefore they cannot happen in response to a simple Dispose call, since it takes no error code as argument.

I agree that this is an important point to consider.

In particular, I think we will need AbortRead(int errorCode) that causes the read side to abort and send a STOP_SENDING frame, and AbortWrite(int errorCode) that causes the write side to abort and send a RESET_STREAM frame.

That said, we're going to need to do something in Dispose if either the read side or write side hasn't completed yet. That means picking some error code and sending this with the STOP_SENDING frame. As @jkotalik suggested above, we can make this configurable if that makes sense, but even then there will need to be some default, likely 0. I think that's fine; if users need more than that, they should call AbortRead or AbortWrite themselves.

@nibanks
Copy link

nibanks commented Dec 11, 2019

but even then there will need to be some default, likely 0. I think that's fine

I cannot stress enough how absolutely not fine that is. The QUIC spec uses no uncertain terms that the application protocol must be the one to specify this value.

As a transport QUIC has no knowledge of this error code space. 0 might mean "everything was successful! Please continue purchasing that Ferrari." It might mean "the world has ended, please delete the user's account." Since this error code will be delivered to the peer, and the peer will see it explicitly as "your peer aborted the stream and chose to use error code X" the app layer must be the one to choose.

So I see only one of two options if Dispose is somehow called while the stream is still in use:

  1. Just call StreamClose and MsQuic will terminate the entire connection.
  2. Have a default value that the app must set before it got this far. Perhaps it's part of a constructor somewhere.

@geoffkizer
Copy link
Contributor

@nibanks Is there some reason QUIC hasn't just defined 0 as a universal error code meaning "unknown" or something like that? This seems useful.

@nibanks
Copy link

nibanks commented Dec 11, 2019

The entire error space is explicitly application owned. QUIC has no insights to the values. There are no default values. Feel free to open an issue on the QUIC spec. I opened one related to this, and any changes were shot down: quicwg/base-drafts#3291

@geoffkizer
Copy link
Contributor

Thanks @nibanks. I commented on the linked issue.

@scalablecory
Copy link
Contributor

not deriving from Stream doesn't actually solve anything here.

I'm beginning to worry that, given QUIC's needs for special handling, exposing Stream as its first-class API might cause incorrect usage expectations.

@MikeBishop
Copy link

MikeBishop commented Dec 17, 2019

Thinking about this from an application's point of view and looking at your interface, this seems to be an artifact of duplicating stream state tracking between the app and the transport stack. The root problem is that they can get out of sync, and garbage collection / the Dispose() method is supposed to reconcile the differences. Maybe the right solution is to have a different reconciliation method.

Suppose that the QuicConnection object exposed a collection of all open streams. No open stream is ever inaccessible until the connection is gone, and the connection object / stack becomes the arbiter of truth for stream state. Then you never need to auto-close streams separate from the connection. If the connection gets disposed, it takes all the streams with it.

Not to mention that being able to access that collection, filtered by stream type, permits some useful interactions:

  • Find a stream by ID if it's open
  • Check whether a stream is open
  • Determine how many concurrent streams are open
  • Compare stream objects to collections held by the app to identify idle streams and take some action (like closing them gracefully or returning them to a pool of available streams)

@jkotalik
Copy link
Contributor Author

No open stream is ever inaccessible until the connection is gone, and the connection object / stack becomes the arbiter of truth for stream state. Then you never need to auto-close streams separate from the connection. If the connection gets disposed, it takes all the streams with it.

This seems like it could be dangerous. What happens if a peer opens and closes a ton of streams (obeying the limits for the count of unidirectional and bidirectional streams)? If we don't remove state for an individual stream, we wouldn't clean it up until the connection is closed. So I think we need to logically know when a stream is "done".

I do think we will eventually have a lookup mechanism for streams on a given connection, however I do think we need to remove these entries.

@jkotalik
Copy link
Contributor Author

Related to my opening issue, we have decided to do the following for QuicStream.

public ValueTask WriteAsync(ReadOnlyMemory<byte> data, bool endStream, System.Threading.CancellationToken cancellationToken = default) => throw null;

public ValueTask ShutdownWriteCompleted(System.Threading.CancellationToken cancellationToken = default) => throw null;

If endStream = true, the call to WriteAsync will start a graceful shutdown sequence. Awaiting this task doesn't await the final ACK though, instead you need to await ShutdownWriteCompleted.

@MikeBishop
Copy link

This seems like it could be dangerous. What happens if a peer opens and closes a ton of streams (obeying the limits for the count of unidirectional and bidirectional streams)? If we don't remove state for an individual stream, we wouldn't clean it up until the connection is closed. So I think we need to logically know when a stream is "done".

I agree, you absolutely need to know when a stream is done. The peer can only generate streams up to the limit provided by the local endpoint, so it's obviously bounded (depending on what logic you're using to issue more Stream ID credit). Thinking about the receive stream state machine, the collection should contain anything that has opened but hasn't reached one of the two terminal states (Data Read or Reset Read).

@karelz
Copy link
Member

karelz commented Feb 20, 2020

Triage: Looks like we have something in place already. Maybe not final, but likely good enough for 5.0.

@karelz karelz modified the milestones: 5.0, Future Feb 20, 2020
@karelz karelz added enhancement Product code improvement that does NOT require public API changes/additions and removed untriaged New issue has not been triaged by the area owner labels Feb 21, 2020
@karelz karelz changed the title Design of Shutdown for QuicStreams in System.Net.Quic [QUIC] Design of Shutdown for QuicStreams in System.Net.Quic Mar 11, 2020
@scalablecory scalablecory modified the milestones: Future, 6.0.0 Sep 23, 2020
@scalablecory
Copy link
Contributor

scalablecory commented Sep 23, 2020

I think this should be something like:

class QuicStream
{
   // graceful shutdown to mimic how a user would expect a Stream to function.
   public ValueTask DisposeAsync()
   {
       ShutdownWrites();
       await ShutdownWritesComplete;
   }

   public void Dispose()
   {
      DisposeAsync().AsTask().GetAwaiter().GetResult();
   }

    // equiv to Write()+ShutdownWrites() but will use a single frame/packet.
    public ValueTask WriteAsync(...., bool shutdownWrites);

    // equiv to shutdown(SD_SEND);
    // should cause any subsequent writes to throw... something.
    public void ShutdownWrites();

    // abortive -- causing any unsent data to be discarded.
    // one direction can be aborted while leaving the other direction open.
    // should cause any outstanding ops to throw StreamAbortedException.
    public void Abort(Direction{Read,Write,Both}, long errorCode);

    // find some better name here.
    // causes MsQuic to immediately, without blocking or waiting for an ack from peer, dispose of any resources.
    // StreamComplete will be set immediately.
    public void AbortAndDispose();

    // when peer has acknowledged we've shutdown or aborted our own writes.
    // Nick, for a graceful shutdown, this should mean all the written data has been ACKed as well right? the FIN packet being received out of order by peer won't cause this to trigger until all data has been received?
    // corresponds to QUIC_STREAM_EVENT_SEND_SHUTDOWN_COMPLETE.
    public Task ShutdownWritesComplete { get; }

    // peer has acknowledged we've shutdown, and we have acknowledged peer has shutdown. stream is ready for disposal.
    // corresponds to QUIC_STREAM_EVENT_SHUTDOWN_COMPLETE.
    public Task StreamComplete { get; }
}

@nibanks any thoughts here?

@geoffkizer
Copy link
Contributor

IDisposable2

Possibly. That opens up a whole other can of worms, though.

@scalablecory
Copy link
Contributor

class QuicStream
{
    // Note: removing shutdown feature as it was part of #43290.
    // public ValueTask CompleteWritesAync();

   public ValueTask CloseAsync(CancellationToken cancellationToken);
   public ValueTask DisposeAsync() => CloseAsync(default);

    // abortive -- causing any unsent data to be discarded.
    // one direction can be aborted while leaving the other direction open.
    // should cause any outstanding ops to throw StreamAbortedException.
    public void Abort(long errorCode, QuicAbortDirection direction = QuicAbortDirection.Both);
}

[Flags]
enum QuicAbortDirection
{
   Read = 1,
   Write = 2,
   Both = 3
}

Usage example:

using QuicStream stream = ...;

try
{
   // make reads and writes here, and finally:
   await stream.CloseAsync(cancellationToken);
}
catch(OperationCanceledException)
{
   stream.Abort(errorCode);
}

@ManickaP
Copy link
Member

Triage: we have the functionality to achieve graceful and graceless closure/dispose of the stream. This is just changing/renaming the API. This is 7.0.

@wegylexy
Copy link
Contributor

If not Stream, how about PipeReader/PipeWriter?

@ManickaP
Copy link
Member

From https://github.com/dotnet/runtime/pull/68288/files/31623532e423818c011bf0ddc267216e7526f7a5#r855163759, we cannot ever assume any error code, they always have to come from an application layer. Either as a parameter or as some default settings. Problematic part:

StartShutdown(QUIC_STREAM_SHUTDOWN_FLAGS.ABORT_RECEIVE, 0xffffffff);

@wegylexy
Copy link
Contributor

An application may want to specify a default error code for the default disposing behavior.

await using var myStream = myConnection.OpenBidirectionalStream(defaultError: 1);
try
{
    // do normal stuff
    // await shutdown normally here!
}
catch (MySpecialException ex)
{
    myStream.AbortRead(ex.Code);
}
// default to 1 if still open

Otherwise, the entire connection should be aborted as the application is leaking a stream out for possible finalizer thread blocking during garbage collection or other undefined consequences.

@ManickaP
Copy link
Member

Closing, covered by #69675

radical pushed a commit to radical/runtime that referenced this issue Jul 7, 2022
[main] Update dependencies from dotnet/arcade


 - Bump to newer Arcade
@ghost ghost locked as resolved and limited conversation to collaborators Jul 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Quic enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests

10 participants