ignore received frames on a stream locally reset #174

seanmonstar · 2017-11-15T20:19:32Z

This adds a map of stream IDs that we have reset locally, since we need to remember even after we've freed the related Stream. Upon receiving a data frame, that map is also checked to see if the stream had been reset.

When we do decide to ignore a data frame, the connection flow control is still adjusted, since the sender had to have adjusted their understanding of this connection's window also.

Closes #74
Closes #32

seanmonstar · 2017-11-15T20:23:49Z

src/proto/streams/recv.rs

+        self.flow.send_data(sz);
+
+        // Track the data as in-flight
+        self.in_flight_data += sz;


There is an open question of whether we should immediately release this connection capacity. If we don't, since there is no related stream to release the capacity, the connection capacity is essentially "leaked" forever. At the same time, if we locally reset this stream, but the remote just keeps sending us data anyways, giving capacity back would allow them to just keep flooding the connection with this stream we no longer want.

Perhaps if we add in the "for some time" part, by checking a time elapsed, then the eventual connection error would be a fine deterrent and allow us to free the connection window automatically.

Even without this question answered, it is still better that we count this anyway, because: it is correct that the window has been used, it is not required that we give the capacity back.

IMO we should immediately release capacity. the window updates will be sequenced after the RST_STREAM frame, so the remote should not send any more data.

Additionally, we won't ever increment the reset stream's window, so even if the remote keeps sending data, the window of that stream will eventually get depleted.

carllerche

Thanks for this PR. This is definitely a very tricky change. I added some comments inline.

Also, I didn't see how accumulated state was ever freed. Specifically, when are locally reset streams released.

The spec says "some amount of time", which is obviously tricky.

I would think that we could use one of the existing pointers to create a linked list of streams that are pending reset. The question is when to purge.

I would say that, for now, we should use an Instant and maybe keep them around for ~30 seconds. Of course, this could cause unbounded growth and could expose a DOS vulnerability given that remotes can trigger local resets.

So, on top of keeping state around for at most 30 seconds, I would probably set a numeric limit as some factor of max concurrency. Then, once this limit is reached, we start aggressively purging to stay below the limit.

carllerche · 2017-11-16T18:17:23Z

src/proto/streams/recv.rs

-        // Update connection level flow control
-        self.flow.send_data(sz);
+        if stream.recv_flow.window_size() < sz {
+            return Err(RecvError::Stream {


This is surprising to me. If this check is violated, it indicates a buggy peer. Would you mind adding a comment referencing the spec explaining why this is a stream level error?

carllerche · 2017-11-16T18:24:36Z

src/proto/streams/state.rs

 }

 #[derive(Debug, Copy, Clone)]
-enum Peer {
+enum OpenPeer {


I'd rather avoid this rename (in this commit at least, we can approach it later). Also, see below.

carllerche · 2017-11-16T18:25:09Z

src/proto/streams/state.rs

    AwaitingHeaders,
    Streaming,
 }

 #[derive(Debug, Copy, Clone)]
 enum Cause {
-    Proto(Reason),
+    EndStream,
+    Proto(Peer, Reason),


Instead of adding a Peer variant, could we add a Cause::LocallyReset variant. I believe that is the case that we really care about?

carllerche · 2017-11-16T18:28:06Z

src/proto/streams/store.rs

+    /// frames on that stream "for some time".
+    // We could store an `Instant` here perhaps, and upon lookup, if the
+    // elapsed time has been too long, pop it from this Map and return false.
+    reset_streams: OrderMap<StreamId, ()>,


Ideally we wouldn't introduce a second map. Would it be possible to track the locally reset state as part of the stream state and have a single lookup path?

It also looks like introducing a second map doubles the number of hash lookups for all received data frames (from below).

So, we could keep the streams around afterwards, but a Stream is rather big in bytes, and we only care about a) the ID, b) that it was reset locally, c) how long ago. Additionally, locally resetting shouldn't be that uncommon, but hopefully receiving frames on a reset stream is less common, and so having a second map is the better situation...

@seanmonstar As of now, I wouldn't worry about the byte size of Stream. We will be shrinking it significantly.

carllerche · 2017-11-16T18:34:36Z

src/proto/streams/streams.rs

@@ -172,9 +172,19 @@ where

        let id = frame.stream_id();

+        if me.store.is_reset(&id) {


As mentioned above, this doubles all hash lookups for received strams.

Also, I think that the stream window flow should still be validated? We just drop frames that the remote sent before seeing the RST_STREAM, we still want to ensure that the connection state is sane.

If we do keep a second smaller map, then I can do some stupid syntax crap to fix the borrow checker. Since a mutable stream is taken from the store, even the None case can't access the store again...

Regarding counting the stream window as well: since the stream is reset, the window should be closed.

Flow-controlled frames (i.e., DATA) received after sending RST_STREAM are counted toward the connection flow-control window.

Right, it needs to be counted against the connection flow control, but unless we keep the stream level flow state around too, we don't know if the received data was a valid size (exceeded the max size the stream would have accepted if it wasn't locally reset).

Basically, we can't differentiate between the remote peer being correct and just not having seen the RST_STREAM vs. the remote being buggy and violating flow control.

carllerche · 2017-12-14T17:53:23Z

@seanmonstar Let me know if you can keep working on this or if I should take it over. I'd like to see this get over the finish line.

seanmonstar · 2017-12-15T05:04:57Z

The changes should be largely all here:

removed the second OrderMap, a new Queue has been added to store streams waiting to expire after a reset
Adds a duration check to determine if a queued stream should be reaped yet
Adds config to set the duration, and a max allowed pending expirations

I need to expose the config in the server builder, and resolve some merge conflicts...

- Adds config duration for how long to ignore frames on a reset stream - Adds config for how many reset streams can be held at a time

carllerche

Looking pretty good. I left inline comments, mostly minor points.

The one thing I did see was that, as far as I can tell, when ignoring data frames due to the stream being reset, connection level capacity can be leaked.

carllerche · 2017-12-15T19:54:15Z

src/client.rs

@@ -242,6 +265,8 @@ impl Builder {
 impl Default for Builder {
    fn default() -> Builder {
        Builder {
+            reset_stream_duration: Duration::from_secs(30),
+            reset_stream_max: 10,


I would probably make this much higher... at least 100. I might also consider making it relative to the max number of streams.... maybe say 50% or something like that.

At the end of the day, though, we don't know what is the right default yet.

carllerche · 2017-12-15T19:56:52Z

src/lib.rs

@@ -1,4 +1,4 @@
-#![deny(warnings, missing_debug_implementations, missing_docs)]
+//#![deny(warnings, missing_debug_implementations, missing_docs)]


Can this stay enabled? I assume this line was accidentally committed.

carllerche · 2017-12-15T20:01:28Z

src/proto/streams/counts.rs

        if stream.is_closed() {
-            stream.unlink();
+            if !stream.is_pending_reset_expiration() {


I think that I'm following this logic (and that it is correct), but it would be super helpful to add a comment describing how transition_after works now that it is getting pretty involved.

carllerche · 2017-12-15T20:13:23Z

src/proto/streams/prioritize.rs

@@ -530,6 +530,7 @@ impl Prioritize {
                    trace!("pop_frame; stream={:?}", stream.id);

                    let is_counted = stream.is_counted();
+                    let is_pending_reset = stream.is_pending_reset_expiration();


This line was confusing at first. My initial thought was that, at this point, the stream could never be pending reset expiration. However, a stream transitions to pending_reset_expiration immediately when the client resets the stream. At this point, the RST_STREAM frame is still queued, and this is probably the point at which we are reading it.

So, in fact, it is actually expected for is_pending_reset to be true sometimes.

Could you add some comments describing this?

carllerche · 2017-12-15T20:20:18Z

src/proto/streams/recv.rs

+            // if max allow is 0, this won't be able to evict,
+            // and then we'll just bail after
+            if let Some(evicted) = self.pending_reset_expired.pop(stream.store_mut()) {
+                let is_counted = evicted.is_counted();


This line was confusing at first. My initial thought was that, at this point, the stream could never be "counted" as it has already been reset. However, a stream transitions to pending_reset_expiration immediately when the client resets the stream, but the stream transitions to "not counted" once the RST_STREAM frame is sent. So, there is a period in which the stream is in the "pending reset expired" queue while it is still counted. Under a specific workload, this line could be hit while the stream is still counted.

Could you add some comments describing this?

carllerche · 2017-12-15T20:29:24Z

src/proto/streams/recv.rs

+
+        if is_ignoring_frame {
+            trace!("recv_data frame being ignored on locally reset {:?} for some time", stream.id);
+            return Ok(());


So the connection level window is consumed right above, but I don't see where it is "released". Specifically, how connection level window is not leaked in this case.

It would be nice to have a test for this.

Related

carllerche · 2017-12-15T20:31:23Z

src/proto/streams/recv.rs

+            // So, for violating the **stream** window, we can send either a
+            // stream or connection error. We've opted to send a stream
+            // error.
+            return Err(RecvError::Stream {


Similar as above, we need to be very sure that if this code path is hit, the connection level window is not leaked. Again, it would be nice to have a test for this.

Related

Looking around this function, it appears there's existing conditions that could cause a stream error after consuming the connection window, and in all of those cases, the window is "leaked".

To fix this, we could add a check outside this function, wherever streams.recv_data() is being called, and if a stream error is returned, try to release the connection capacity again...

carllerche · 2017-12-15T20:34:38Z

src/proto/streams/streams.rs

@@ -172,9 +175,21 @@ where

        let id = frame.stream_id();

+        /*


Should this comment block be removed?

carllerche · 2017-12-16T00:18:32Z

src/proto/streams/recv.rs

@@ -385,6 +401,7 @@ impl Recv {

        if is_ignoring_frame {
            trace!("recv_data frame being ignored on locally reset {:?} for some time", stream.id);
+            self.release_connection_capacity(sz, &mut None);


So, just to be extra clear (because, I am very tired and I got pretty confused). Could you add a comment that self.release_connection_capacity isn't exactly the opposite of self.consume_connection_window(sz)?;, and you still need to perform both fn calls to ensure that a WINDOW_UPDATE gets sent out?

carllerche · 2017-12-16T00:20:06Z

While working through this PR, we uncovered some additional flow control issues. Since they are unrelated to this specific change, I created #183 to track.

carllerche · 2017-12-18T19:01:51Z

PR looks good, feel free to merge whenever you are done w/ comments & all that.

seanmonstar commented Nov 15, 2017

View reviewed changes

seanmonstar requested a review from carllerche November 15, 2017 22:29

carllerche requested changes Nov 16, 2017

View reviewed changes

carllerche assigned seanmonstar Dec 14, 2017

seanmonstar force-pushed the send-reset-allows-recv-for-some-time branch from 21bcb0e to 38382ea Compare December 15, 2017 05:02

seanmonstar added 3 commits December 15, 2017 12:21

ignore received frames on a stream locally reset

8f74cd4

change from using an extra ordermap to having a pending_reset Queue

4d2a5e9

- Adds config duration for how long to ignore frames on a reset stream - Adds config for how many reset streams can be held at a time

add reset stream config to Server builder

066a3a1

seanmonstar force-pushed the send-reset-allows-recv-for-some-time branch from 38382ea to 066a3a1 Compare December 15, 2017 20:39

carllerche requested changes Dec 15, 2017

View reviewed changes

seanmonstar added 3 commits December 15, 2017 12:44

remove unrelated server reset test

27ae627

add comment about checking stream.is_pending_reset

a2582f1

automatically release connection capacity for ignored frames

d1d6db0

carllerche reviewed Dec 16, 2017

View reviewed changes

add comment about release connection capacity for ignored frame

921cd81

carllerche mentioned this pull request Dec 18, 2017

Releasing capacity after a stream is reset. #183

Closed

seanmonstar added 2 commits December 18, 2017 10:54

add comment for why expired check is outside poll2 loop

fa695a0

undo removal of deny warnings lint

e9226ab

carllerche approved these changes Dec 18, 2017

View reviewed changes

seanmonstar merged commit 1ea9a8f into master Dec 18, 2017

seanmonstar deleted the send-reset-allows-recv-for-some-time branch December 18, 2017 19:09

carllerche mentioned this pull request Apr 23, 2018

Reset queued streams on recv_err #259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ignore received frames on a stream locally reset #174

ignore received frames on a stream locally reset #174

seanmonstar commented Nov 15, 2017

seanmonstar Nov 15, 2017

carllerche Nov 16, 2017

carllerche Nov 16, 2017

carllerche left a comment

carllerche Nov 16, 2017

carllerche Nov 16, 2017

carllerche Nov 16, 2017

carllerche Nov 16, 2017

seanmonstar Dec 1, 2017

carllerche Dec 1, 2017

carllerche Nov 16, 2017

seanmonstar Dec 1, 2017

seanmonstar Dec 1, 2017

carllerche Dec 1, 2017

carllerche commented Dec 14, 2017

seanmonstar commented Dec 15, 2017

carllerche left a comment

carllerche Dec 15, 2017

carllerche Dec 15, 2017

carllerche Dec 15, 2017

carllerche Dec 15, 2017

carllerche Dec 15, 2017

carllerche Dec 15, 2017

carllerche Dec 15, 2017

seanmonstar Dec 15, 2017

carllerche Dec 15, 2017

carllerche Dec 16, 2017

carllerche commented Dec 16, 2017

carllerche commented Dec 18, 2017

		@@ -172,9 +172,19 @@ where

		let id = frame.stream_id();

		if me.store.is_reset(&id) {

		@@ -1,4 +1,4 @@
		#![deny(warnings, missing_debug_implementations, missing_docs)]
		//#![deny(warnings, missing_debug_implementations, missing_docs)]

ignore received frames on a stream locally reset #174

ignore received frames on a stream locally reset #174

Conversation

seanmonstar commented Nov 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carllerche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carllerche commented Dec 14, 2017

seanmonstar commented Dec 15, 2017

carllerche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carllerche commented Dec 16, 2017

carllerche commented Dec 18, 2017