-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RSDK-7105 - Fix server streaming #107
Conversation
// 1-5 because those are the length header bytes for gRPC | ||
to_add_bytes.clone_from_slice(&data[1..5]); | ||
let mut next_message_length = u32::from_be_bytes(to_add_bytes); | ||
// if this is all streaming calls we need to tell the server when we're done with | ||
// the stream, otherwise neither side will know we're done, trailers will never be | ||
// sent/processed, and we'll hang on the strream. | ||
let it_was_all_a_stream = usize::try_from(next_message_length).unwrap() + 5 < data.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this logic to determine whether or not we were in a stream was only done once, at the top level loop. This made sense insofar as it would accurately tell us if we were in a streaming scenario (a fact that wouldn't change), but it wasn't ideal for determining when we were at end of stream (which we were using it for).
eos: if !remaining.is_empty() { | ||
// stream definitely isn't done if there's more to send | ||
false | ||
} else { | ||
// if we intentionally sent an eos or the http request was inferrably | ||
// a stream | ||
eos || it_was_all_a_stream | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny enough, we had a bit of a "two wrongs do make a right" scenario here. We shouldn't be saying "stream done" every time we're in a stream (which we were doing before), but this actually led to client streaming working correctly instead because we were just defaulting to saying it was always eos, which seemed to be fine.
The issue seems to be that in unary calls, eos
(as far as I can tell) is meaningless so the fact that we were always saying "false" for eos in those cases (because eos
was being passed as false, and because it definitely wasn't a stream so it_was_all_a_stream
evaluated to false) didn't really matter. In client streaming cases, because we weren't ever re-evaluating the it_was_all_a_stream
value, we were actually passing "true" here at all times! Thus, everything worked out okay.
But in server streaming cases, it_was_all_a_stream
was evaluating to false and so we were saying eos: false
. This is exactly the same as in the unary case, which worked perfectly normally. My theory is that on the server side, eos
is evaluated/matters in all streaming call cases (thus causing issues with our server streaming request despite the fact that the request is unary), but never in unary call cases (thus allowing unary requests to succeed despite the fact that eos
was always false).
let stream = stream.clone(); | ||
let eos = remaining.is_empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got bit here a bit by trying to be too close to Go
. The Go logic doesn't seem to have a way of reliably saying that the stream is done, hence taking eos
as an argument. We initially copied that in Rust, but it turns out we do have a good way to determine if the stream is done or not, since we receive a complete vec of bytes containing all the data of all the calls. Thus we can just check for that vec
to be empty, removing the need for an eos
value to ever be passed and reducing the risk of it being passed incorrectly.
Worth noting, the case where there is meaningful latency delay between messages being sent on the client side in a bidi streaming context remains a little uncertain. We seem to consistently receive all messages in a single vec, and thus can reliably infer when eos
occurs. I'm not certain what will happen in cases where there's a heavy delay (is there blocking somewhere in rust to collect all the stream messages? will we end up making multiple call
calls under the hood?), in part because (to my knowledge) we don't have any APIs that work that way yet. But if and when we do, some revisiting may be necessary. There's a comment to this effect in the file already.
// note(ethan): the variable name that used to exist for determining `eos` was | ||
// `it_was_all_a_stream`. Which isn't important at all, but | ||
// `it_was_all_a_stream` is the best variable name I've ever shipped into | ||
// production and it makes me sad to see it go, so I wanted to memorialize | ||
// it somehow! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😭
(note to reviewers): I'm still doing a liiittle bit more testing as I consider more cases so it's possible there will be future changes based on that. But, I have decently high confidence that this is correct and didn't want to prevent review from happening. If anything meaningful does change, I will be sure to ping reviewers and flag it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to understand what was going on, but your explanations really helped. LGTM! I'm glad we're working towards using a more "correct" way of doing things
next_message_length
as ausize
initially to reduce cases oftry_into().unwrap()
http
import