-
-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Be more lenient with streams in the pending_send
queue.
#261
Conversation
The `is_peer_reset()` check doesn't quite cover all the cases where we call `clear_queue`, such as when we call `recv_err`. Instead of trying to make the check more precise, let's gracefully handle spurious entries in the queue.
At a glance, this seems right to me --- it's similar to the first thing I tried while working on #247, which appeared to work correctly. However, I'd like to hear from @carllerche before approving it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not necessarily against this change. But, it is a step away from trying to be explicit about state transitions and tracking the current expected state w/ assertions (usually debug_assert
).
I don't know if anybody has thoughts on that change in strategy.
src/proto/streams/prioritize.rs
Outdated
// in clear_queue(). Instead of doing O(N) traversal through queue | ||
// to remove, lets just ignore the stream here. | ||
trace!("removing dangling stream from pending_send"); | ||
counts.transition_after(stream, is_counted, is_pending_reset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why this needs to be called here? I don't think this was called before (and I could believe it was another bug).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it was not called before. I think that it should be, otherwise we could leak streams - it's possible that the pending_send
queue here was holding onto the last reference to the stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is highly likely that this change is correct, but i'd like to see if we can figure out a failing test case.
The change looks fine, though I'm also curious about two things:
|
I believe that this change is probably the right direction. Panicking in production is no good. It would be good to figure out how to catch any bugs that are let through though. At the very least a |
I added a |
So I spent a bunch of time trying to reproduce the issue described in the comment. All the paths that I tracked down via Either That said, I think that the consensus is that this is still a good change even if it does not fix any bugs. Edit: This is referencing streams still being in |
If you run the fuzzer in #263 you can definitely run into panics in this code if this patch isn't applied. |
k, i will try that |
Thanks to the fuzzer (#263), I have identified the logic path that results in the panic. Unfortunately, it relies on a very specific pattern of the underlying I/O's write fn returning So, given that, I think the best course of action is just going to be to include the fuzzing run and call it a day. I still need to verify the @goffrie Thanks for your patience. My assumption is that you are currently using a fork that includes this patch, so you are not currently blocked. If this assumption is incorrect, please let me know and we can merge the PR before I complete the work stated above. |
That's correct; take your time. Thanks for your work on this :)
…On Sat, May 5, 2018, 09:11 Carl Lerche, ***@***.***> wrote:
Thanks to the fuzzer (#263 <#263>),
I have identified the logic path that results in the panic.
Unfortunately, it relies on a very specific pattern of the underlying
I/O's write fn returning Ready / NotReady, which means that writing a
robust test is not really possible. Any change to the h2 implementation
could break the test.
So, given that, I think the best course of action is just going to be to
include the fuzzing run and call it a day.
I still need to verify the counts.transition_after(stream, is_counted,
is_pending_reset); line. I believe that this does fix a bug, but I would
like to confirm this first. This will be my next focus.
@goffrie <https://github.com/goffrie> Thanks for your patience. My
assumption is that you are currently using a fork that includes this patch,
so you are not currently blocked. If this assumption is incorrect, please
let me know and we can merge the PR before I complete the work stated above.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#261 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABLtTsJxxd6mpo09TVRlDIDVARTnIy7Sks5tvc8cgaJpZM4TeGaz>
.
|
So, I haven't reproed the issue w/ the missing count decrement, but I did find & fix a number of other related bugs (#273). |
This patch adds a debug level assertion checking that state associated with streams is successfully released. This is done by ensuring that no stream state remains when the connection inner state is dropped. This does not happen until all handles to the connection drops, which should result in all streams being dropped. This also pulls in a fix for one bug that is exposed with this change. The fix is first proposed as part of #261.
Ok! I finally got an assertion that fails and is fixed by adding I pulled in that specific change as part of 44c9005. Once I'm done with all that work, this PR can be rebased on top of that. |
This patch includes two new significant debug assertions: * Assert stream counts are zero when the connection finalizes. * Assert all stream state has been released when the connection is dropped. These two assertions were added in an effort to test the fix provided by #261. In doing so, many related bugs have been discovered and fixed. The details related to these bugs can be found in #273.
Alright, I have rebased this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch fixes a bug that is only exposed with a particular sequence of write
calls returning Ready
vs. NotReady
.
Fuzzing catches this but there is no explicit test that does.
The
is_peer_reset()
check doesn't quite cover all the cases where we callclear_queue
, such as when we callrecv_err
. Instead of trying to make thecheck more precise, let's gracefully handle spurious entries in the queue.
This fixes the issue I mentioned at #258 (comment).